You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/12 06:27:11 UTC

[GitHub] [hudi] alexeykudinkin opened a new pull request, #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

alexeykudinkin opened a new pull request, #5296:
URL: https://github.com/apache/hudi/pull/5296

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`, in cases when old-record is carried over from existing file as is.
   
   ## Brief change log
   
    - Revisited HoodieFileWriter API to accept HoodieKey instead of `HoodieRecord`
    - Fixed FILENAME_METADATA_FIELD not being overridden in cases when simply old record is carried over
    - Exposing standard JVM's debugger ports in Docker setup
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   This change added tests and can be verified as follows:
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096165047

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096197206

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002) 
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1097141842

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005",
       "triggerID" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   * df54e1dd20f6e602c177a00295ac7ca616d7d029 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005) 
   * 0028d5a8fc860de4f222a35f80323003b69b957b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1097153105

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005",
       "triggerID" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8022",
       "triggerID" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   * df54e1dd20f6e602c177a00295ac7ca616d7d029 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005) 
   * 0028d5a8fc860de4f222a35f80323003b69b957b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8022) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096396627

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005",
       "triggerID" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   * df54e1dd20f6e602c177a00295ac7ca616d7d029 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096256827

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002) 
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   * df54e1dd20f6e602c177a00295ac7ca616d7d029 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1097260340

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1097337679

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005",
       "triggerID" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8022",
       "triggerID" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8023",
       "triggerID" : "1097260340",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   * 0028d5a8fc860de4f222a35f80323003b69b957b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8022) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8023) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #5296:
URL: https://github.com/apache/hudi/pull/5296#discussion_r848809893


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -354,12 +349,7 @@ public void write(GenericRecord oldRecord) {
     if (copyOldRecord) {
       // this should work as it is, since this is an existing record
       try {
-        // rewrite file names
-        // do not preserve FILENAME_METADATA_FIELD
-        if (preserveMetadata && useWriterSchemaForCompaction) {
-          oldRecord.put(HoodieRecord.FILENAME_METADATA_FIELD_POS, newFilePath.getName());
-        }
-        fileWriter.writeAvro(key, oldRecord);

Review Comment:
   There are obviously exceptions to this rule, but it requires guarding in making sure that the in-place changes do not escape the scope they were originally intended for



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -354,12 +349,7 @@ public void write(GenericRecord oldRecord) {
     if (copyOldRecord) {
       // this should work as it is, since this is an existing record
       try {
-        // rewrite file names
-        // do not preserve FILENAME_METADATA_FIELD
-        if (preserveMetadata && useWriterSchemaForCompaction) {
-          oldRecord.put(HoodieRecord.FILENAME_METADATA_FIELD_POS, newFilePath.getName());
-        }
-        fileWriter.writeAvro(key, oldRecord);

Review Comment:
   First and foremost is that we should not be updating passed in record in place* (ie we should treat it as immutable).  Rewriting the record does essentially clone it, allowing us to modify it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1097226431

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005",
       "triggerID" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8022",
       "triggerID" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   * 0028d5a8fc860de4f222a35f80323003b69b957b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8022) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5296:
URL: https://github.com/apache/hudi/pull/5296#discussion_r848452061


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -354,12 +349,7 @@ public void write(GenericRecord oldRecord) {
     if (copyOldRecord) {
       // this should work as it is, since this is an existing record
       try {
-        // rewrite file names
-        // do not preserve FILENAME_METADATA_FIELD
-        if (preserveMetadata && useWriterSchemaForCompaction) {
-          oldRecord.put(HoodieRecord.FILENAME_METADATA_FIELD_POS, newFilePath.getName());
-        }
-        fileWriter.writeAvro(key, oldRecord);

Review Comment:
   in this flow, do we need to do both?
   - rewriteRecord(avroRecord)
   - writeAvroWithMetadata
   
   I guess, only 2nd one would suffice. L 369. 
   I understand for L296, we need both, but here we don't need to rewrite record for the case when preserveMetadata is false.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #5296:
URL: https://github.com/apache/hudi/pull/5296#discussion_r849932758


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -370,6 +360,16 @@ public void write(GenericRecord oldRecord) {
     }
   }
 
+  protected void writeToFile(HoodieKey key, GenericRecord avroRecord, boolean shouldPreserveRecordMetadata) throws IOException {
+    if (shouldPreserveRecordMetadata) {
+      // NOTE: `FILENAME_METADATA_FIELD` has to be rewritten to correctly point to the

Review Comment:
   Totally. Previously it relied on `preserveMetadata` fields which were not available in `CreateHandle`, so had to bail on moving it to `WriteHandle` class, but then after refactoring forgot to update `CreateHandle` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5296:
URL: https://github.com/apache/hudi/pull/5296#discussion_r849990018


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -370,6 +360,16 @@ public void write(GenericRecord oldRecord) {
     }
   }
 
+  protected void writeToFile(HoodieKey key, GenericRecord avroRecord, boolean shouldPreserveRecordMetadata) throws IOException {
+    if (shouldPreserveRecordMetadata) {
+      // NOTE: `FILENAME_METADATA_FIELD` has to be rewritten to correctly point to the

Review Comment:
   It is great if you can do that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #5296:
URL: https://github.com/apache/hudi/pull/5296#discussion_r849933100


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -370,6 +360,16 @@ public void write(GenericRecord oldRecord) {
     }
   }
 
+  protected void writeToFile(HoodieKey key, GenericRecord avroRecord, boolean shouldPreserveRecordMetadata) throws IOException {
+    if (shouldPreserveRecordMetadata) {
+      // NOTE: `FILENAME_METADATA_FIELD` has to be rewritten to correctly point to the

Review Comment:
   I do want to review all handles more holistically and cleanup quite a bit of duplication and unnecessary complication that we've currently amassed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096159804

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096207473

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002) 
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1097263759

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005",
       "triggerID" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8022",
       "triggerID" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0028d5a8fc860de4f222a35f80323003b69b957b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8023",
       "triggerID" : "1097260340",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   * 0028d5a8fc860de4f222a35f80323003b69b957b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8022) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8023) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096263191

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002",
       "triggerID" : "e5d566882c6f3ed58a65a01065f0ae99dfb420b2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9458d847182b0628d228211d010310ade743d431",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9458d847182b0628d228211d010310ade743d431",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005",
       "triggerID" : "df54e1dd20f6e602c177a00295ac7ca616d7d029",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002) 
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   * df54e1dd20f6e602c177a00295ac7ca616d7d029 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8005) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
nsivabalan merged PR #5296:
URL: https://github.com/apache/hudi/pull/5296


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5296:
URL: https://github.com/apache/hudi/pull/5296#discussion_r849017448


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -370,6 +360,16 @@ public void write(GenericRecord oldRecord) {
     }
   }
 
+  protected void writeToFile(HoodieKey key, GenericRecord avroRecord, boolean shouldPreserveRecordMetadata) throws IOException {
+    if (shouldPreserveRecordMetadata) {
+      // NOTE: `FILENAME_METADATA_FIELD` has to be rewritten to correctly point to the

Review Comment:
   Can create handle reuse this method ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org