You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/13 09:13:29 UTC

[GitHub] [hudi] xiarixiaoyao opened a new pull request, #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

xiarixiaoyao opened a new pull request, #6816:
URL: https://github.com/apache/hudi/pull/6816

   ### Change Logs
   
   add integrity check of merged parquet file for HoodieMergeHandle.
   
   In the current production environment, due to the instability of the cluster, it is very easy for hudi to write the corrupt parquet file, which makes the entire table unavailable,
   eg:
   Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Required field 
   
   'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@183284b
   	at org.apache.parquet.format.Util.read(Util.java:365)
   	at org.apache.parquet.format.Util.readPageHeader(Util.java:132)
   	at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readPageHeader(ParquetFileReader.java:1382)
   
   
   After the hudi completes the parquet writing, add a simple parquet file integrity check to ensure that no corrupt files will enter the hudi table
   
   
   ### Impact
   
   Risk level: none
   
   
   ### Documentation Update
   
   N/A
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6816:
URL: https://github.com/apache/hudi/pull/6816#issuecomment-1277305947

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6031e0de844bfc48745caed8795e2de15e5970af UNKNOWN
   * 4b262c2672f63ef7d988be35b62055fc6f952cc7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6816:
URL: https://github.com/apache/hudi/pull/6816#issuecomment-1277313427

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12183",
       "triggerID" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6031e0de844bfc48745caed8795e2de15e5970af UNKNOWN
   * 4b262c2672f63ef7d988be35b62055fc6f952cc7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12183) 
   * 226aad3bb81a393696878cf84eb605378e88a7de UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao closed pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao closed pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.
URL: https://github.com/apache/hudi/pull/6816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6816:
URL: https://github.com/apache/hudi/pull/6816#issuecomment-1277388749

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12183",
       "triggerID" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6031e0de844bfc48745caed8795e2de15e5970af UNKNOWN
   * 4b262c2672f63ef7d988be35b62055fc6f952cc7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12183) 
   * 226aad3bb81a393696878cf84eb605378e88a7de UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6816:
URL: https://github.com/apache/hudi/pull/6816#issuecomment-1277299009

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6031e0de844bfc48745caed8795e2de15e5970af UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6816:
URL: https://github.com/apache/hudi/pull/6816#issuecomment-1277395558

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12183",
       "triggerID" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "930fffb5584203f6a33b73830f77e9ef1027f0ab",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "930fffb5584203f6a33b73830f77e9ef1027f0ab",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6031e0de844bfc48745caed8795e2de15e5970af UNKNOWN
   * 4b262c2672f63ef7d988be35b62055fc6f952cc7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12183) 
   * 226aad3bb81a393696878cf84eb605378e88a7de UNKNOWN
   * 930fffb5584203f6a33b73830f77e9ef1027f0ab UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6816:
URL: https://github.com/apache/hudi/pull/6816#issuecomment-1277688011

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12183",
       "triggerID" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "930fffb5584203f6a33b73830f77e9ef1027f0ab",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12186",
       "triggerID" : "930fffb5584203f6a33b73830f77e9ef1027f0ab",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6031e0de844bfc48745caed8795e2de15e5970af UNKNOWN
   * 226aad3bb81a393696878cf84eb605378e88a7de UNKNOWN
   * 930fffb5584203f6a33b73830f77e9ef1027f0ab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12186) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao closed pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao closed pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.
URL: https://github.com/apache/hudi/pull/6816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6816: [MINOR]add integrity check of merged parquet file for HoodieMergeHandle.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6816:
URL: https://github.com/apache/hudi/pull/6816#issuecomment-1277406144

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6031e0de844bfc48745caed8795e2de15e5970af",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12183",
       "triggerID" : "4b262c2672f63ef7d988be35b62055fc6f952cc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "226aad3bb81a393696878cf84eb605378e88a7de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "930fffb5584203f6a33b73830f77e9ef1027f0ab",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12186",
       "triggerID" : "930fffb5584203f6a33b73830f77e9ef1027f0ab",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6031e0de844bfc48745caed8795e2de15e5970af UNKNOWN
   * 4b262c2672f63ef7d988be35b62055fc6f952cc7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12183) 
   * 226aad3bb81a393696878cf84eb605378e88a7de UNKNOWN
   * 930fffb5584203f6a33b73830f77e9ef1027f0ab Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12186) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org