You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/04 06:21:18 UTC

[GitHub] [hudi] vingov opened a new pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

vingov opened a new pull request #4503:
URL: https://github.com/apache/hudi/pull/4503


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *This pull request adds the implementation details for the Hudi BigQuery integration RFC-34.*
   
   ## Brief change log
   
   *(for example:)*
     - *RFC-34 Hudi BigQuery Integration details were updated.*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   ## Committer checklist
   
    - [x] Has a corresponding JIRA in PR title & commit
    
    - [x] Commit message is descriptive of the change
    
    - [x] CI is green
   
    - [x] Necessary doc changes done or have another open PR
          
    - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1004580607


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022755412


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545) 
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086796882


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   * 72062cc4dda1aa5ddd19212960d9ade27214ca0e UNKNOWN
   * d3c8b412fb82ba727c424c312e3d43b4976699cd UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086792623


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
xushiyan commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086835544


   A reminder to call out the limitation and sample configs in the website update PR as the guide to use this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086791307


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022780659


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022756937


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545) 
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022660299


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   * 0ec28cc6531761ee4e10700940560129d9016bb4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542) 
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vingov commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vingov commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1060273027


   > @vingov are we tracking this for 0.11 as planned? Could you please open a WIP PR when ready, so we can prioritize this!
   
   Yes, I will submit the PR soon in the next two weeks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086791759


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086797674


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   * 72062cc4dda1aa5ddd19212960d9ade27214ca0e UNKNOWN
   * d3c8b412fb82ba727c424c312e3d43b4976699cd UNKNOWN
   * d392a93e4f21ff85cb4a57ad401971dd16a60541 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086800007


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7785",
       "triggerID" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   * 72062cc4dda1aa5ddd19212960d9ade27214ca0e UNKNOWN
   * d3c8b412fb82ba727c424c312e3d43b4976699cd UNKNOWN
   * d392a93e4f21ff85cb4a57ad401971dd16a60541 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7785) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086793026


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1004558109


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1020614643


   @vingov What are the current blockers for making this work end-end? just the `.hoodie_partition_metadata` filtering ? 
   
   @prashantwason shared an interesting idea to make that file an empty parquet file, and the contents of the current file put into the footers. We can provide an upgrade utility to do this as well in 0.11 and we should be good here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022733967


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1004557125


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022695629


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0ec28cc6531761ee4e10700940560129d9016bb4 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542) 
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022756937


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545) 
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086791307


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1016577721


   Looping in @troykershaw , who is also interested in exploring this. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022656270


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   * 0ec28cc6531761ee4e10700940560129d9016bb4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086797674


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   * 72062cc4dda1aa5ddd19212960d9ade27214ca0e UNKNOWN
   * d3c8b412fb82ba727c424c312e3d43b4976699cd UNKNOWN
   * d392a93e4f21ff85cb4a57ad401971dd16a60541 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086791759


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1058001264


   >The next blocker would be finding a generic way to exclude .hoodie folder which works for any partitioned/non-partitioned table.
   
   Love to understand this. Could you list out the bandaids the PoC needs atm?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1004557125


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1004580607


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#discussion_r786984729



##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,165 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi file format, but
+it has support for the Parquet file format. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the
+schema by enabling this flag:
+
+```
+hoodie.datasource.write.drop.partition.columns = 'true'
+```
+
+2. As part of the BigQuerySync, the sync tool will generate/update the manifest files inside the .hoodie metadata files.
+   For tables which already exist, you can generate a manifest file for the Hudi table which has the list of the latest
+   snapshot parquet file names in a CSV format with only one column the file name. The location of the manifest file
+   will be on the .hoodie metadata folder (`gs://bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv`)
+
+```
+// this command is coming soon.
+GENERATE symlink_format_manifest FOR TABLE dwh.bq_demo_partitioned_cow;
+```
+
+3. Create a BQ table named `hudi_table_name_manifest` with only one column filename with this location gs:
+   //bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_manifest`
+(
+  filename STRING
+)
+OPTIONS(
+  format="CSV",
+  uris=["gs://hudi_datasets/bq_demo_partitioned_cow/.hoodie/manifest/latest_snapshot_files.csv"]
+);
+```
+
+4. Create another BQ table named `hudi_table_name_history` with this location `gs://bucket_name/table_name`, don't use
+   this table to query the data, this table will have duplicate records since it scans all the versions of parquet files
+   in the table/partition folders.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_history`
+WITH 
+  PARTITION COLUMNS 
+  OPTIONS(
+    ignore_unknown_values=true, 
+    format="PARQUET", 
+    hive_partition_uri_prefix="gs://hudi_datasets/bq_demo_partitioned_cow/",
+    uris=["gs://hudi_snowflake/bq_demo_partitioned_cow/dt=*"]
+  );
+```
+
+5. Create a BQ view with the same hudi table name with this query, this view you created has the data from the Hudi
+   table without any duplicates, you can use that table to query the data.
+
+```
+CREATE VIEW `my-first-project.dwh.bq_demo_partitioned_cow` AS 
+  SELECT
+  *
+  FROM
+  `my-first-project.dwh.bq_demo_partitioned_cow_history`
+  WHERE
+  _hoodie_file_name IN (
+    SELECT 
+      filename 
+    FROM
+      `my-first-project.dwh.bq_demo_partitioned_cow_manifest`
+  );
+```
+
+BigQuerySync tool will
+use [HoodieTableMetaClient](https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java)
+methods to get the list of latest set of parquet data files to generate the manifest csv file, then will invoke
+the [BigQuery Java Client](https://github.com/googleapis/java-bigquery/blob/main/samples/snippets/src/main/java/com/example/bigquery/CreateTableExternalHivePartitioned.java)
+to create the manifest table, history table and hudi table views.
+
+**All the steps described here will be automated, all you have to do is to supply a bunch of configs to enable the
+BigQuery sync.**
+
+## Rollout/Adoption Plan
+
+There are no impacts to existing users since this is entirely a new feature to support a new use case hence there are no
+migrations/behavior changes required.
+
+After the BigQuery sync tool has been implemented, I will reach out to Uber's Hudi/BigQuery team to rollout this feature
+for their BigQuery ingestion service.
+
+## Test Plan
+
+This RFC aims to implement a new SyncTool to sync the Hudi table to BigQuery, to test this feature, there will be some
+test tables created and updated on to the BigQuery along with unit tests for the code. Since this is an entirely new
+feature, I am confident that this will not cause any regressions during and after roll out.
+
+## Future Plans
+
+After this feature has been rolled out, the same model can be applied to sync the Hudi tables to other external data

Review comment:
       Given BigQuery has a serverless model, I am guessing, we could even directly update a native bigquery table to create the views. May be note that here? Nonethless, this is good baseline for syncing to other systems as well. 

##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,165 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi file format, but
+it has support for the Parquet file format. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the
+schema by enabling this flag:
+
+```
+hoodie.datasource.write.drop.partition.columns = 'true'
+```
+
+2. As part of the BigQuerySync, the sync tool will generate/update the manifest files inside the .hoodie metadata files.
+   For tables which already exist, you can generate a manifest file for the Hudi table which has the list of the latest
+   snapshot parquet file names in a CSV format with only one column the file name. The location of the manifest file
+   will be on the .hoodie metadata folder (`gs://bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv`)
+
+```
+// this command is coming soon.

Review comment:
       this would be a Spark SQL statement?

##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,165 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi file format, but

Review comment:
       Can we not deem Hudi a "file format" ? :) 

##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,165 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi file format, but
+it has support for the Parquet file format. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the

Review comment:
       this is little weird. We could make a case for this being relaxed. No other WH needs this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#discussion_r819217760



##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,166 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi, but it has
+support for the Parquet and other formats. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the
+schema by enabling this flag:
+
+```
+hoodie.datasource.write.drop.partition.columns = 'true'
+```
+
+2. As part of the BigQuerySync, the sync tool will generate/update the manifest files inside the .hoodie metadata files.
+   For tables which already exist, you can generate a manifest file for the Hudi table which has the list of the latest
+   snapshot parquet file names in a CSV format with only one column the file name. The location of the manifest file
+   will be on the .hoodie metadata folder (`gs://bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv`)
+
+```
+// this command is coming soon.
+// the alternative for this command could be a JAVA API to generate the manifest.
+GENERATE symlink_format_manifest FOR TABLE dwh.bq_demo_partitioned_cow;
+```
+
+3. Create a BQ table named `hudi_table_name_manifest` with only one column filename with this location gs:
+   //bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_manifest`
+(
+  filename STRING
+)
+OPTIONS(
+  format="CSV",
+  uris=["gs://hudi_datasets/bq_demo_partitioned_cow/.hoodie/manifest/latest_snapshot_files.csv"]
+);
+```
+
+4. Create another BQ table named `hudi_table_name_history` with this location `gs://bucket_name/table_name`, don't use
+   this table to query the data, this table will have duplicate records since it scans all the versions of parquet files
+   in the table/partition folders.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_history`
+WITH 
+  PARTITION COLUMNS 
+  OPTIONS(
+    ignore_unknown_values=true, 
+    format="PARQUET", 
+    hive_partition_uri_prefix="gs://hudi_datasets/bq_demo_partitioned_cow/",
+    uris=["gs://hudi_snowflake/bq_demo_partitioned_cow/dt=*"]
+  );
+```
+
+5. Create a BQ view with the same hudi table name with this query, this view you created has the data from the Hudi
+   table without any duplicates, you can use that table to query the data.

Review comment:
       this implementation only makes BQ work with COW tables, right? better state the limitation clearly in some section. We need to clarify it in release notes, too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#discussion_r841202806



##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,166 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi, but it has
+support for the Parquet and other formats. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the
+schema by enabling this flag:
+
+```
+hoodie.datasource.write.drop.partition.columns = 'true'

Review comment:
       We should also highlight that hive style partition = true and only partitioned tables are applicable

##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,166 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi, but it has
+support for the Parquet and other formats. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the
+schema by enabling this flag:
+
+```
+hoodie.datasource.write.drop.partition.columns = 'true'
+```
+
+2. As part of the BigQuerySync, the sync tool will generate/update the manifest files inside the .hoodie metadata files.
+   For tables which already exist, you can generate a manifest file for the Hudi table which has the list of the latest
+   snapshot parquet file names in a CSV format with only one column the file name. The location of the manifest file
+   will be on the .hoodie metadata folder (`gs://bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv`)
+
+```
+// this command is coming soon.
+// the alternative for this command could be a JAVA API to generate the manifest.
+GENERATE symlink_format_manifest FOR TABLE dwh.bq_demo_partitioned_cow;
+```
+
+3. Create a BQ table named `hudi_table_name_manifest` with only one column filename with this location gs:
+   //bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_manifest`
+(
+  filename STRING
+)
+OPTIONS(
+  format="CSV",
+  uris=["gs://hudi_datasets/bq_demo_partitioned_cow/.hoodie/manifest/latest_snapshot_files.csv"]
+);
+```
+
+4. Create another BQ table named `hudi_table_name_history` with this location `gs://bucket_name/table_name`, don't use
+   this table to query the data, this table will have duplicate records since it scans all the versions of parquet files
+   in the table/partition folders.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_history`
+WITH 
+  PARTITION COLUMNS 
+  OPTIONS(
+    ignore_unknown_values=true, 
+    format="PARQUET", 
+    hive_partition_uri_prefix="gs://hudi_datasets/bq_demo_partitioned_cow/",
+    uris=["gs://hudi_snowflake/bq_demo_partitioned_cow/dt=*"]
+  );
+```
+
+5. Create a BQ view with the same hudi table name with this query, this view you created has the data from the Hudi
+   table without any duplicates, you can use that table to query the data.

Review comment:
       Feels like we need to have a limitation section to call out this and others




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vingov commented on a change in pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vingov commented on a change in pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#discussion_r793180989



##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,165 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi file format, but
+it has support for the Parquet file format. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the
+schema by enabling this flag:
+
+```
+hoodie.datasource.write.drop.partition.columns = 'true'
+```
+
+2. As part of the BigQuerySync, the sync tool will generate/update the manifest files inside the .hoodie metadata files.
+   For tables which already exist, you can generate a manifest file for the Hudi table which has the list of the latest
+   snapshot parquet file names in a CSV format with only one column the file name. The location of the manifest file
+   will be on the .hoodie metadata folder (`gs://bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv`)
+
+```
+// this command is coming soon.
+GENERATE symlink_format_manifest FOR TABLE dwh.bq_demo_partitioned_cow;
+```
+
+3. Create a BQ table named `hudi_table_name_manifest` with only one column filename with this location gs:
+   //bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_manifest`
+(
+  filename STRING
+)
+OPTIONS(
+  format="CSV",
+  uris=["gs://hudi_datasets/bq_demo_partitioned_cow/.hoodie/manifest/latest_snapshot_files.csv"]
+);
+```
+
+4. Create another BQ table named `hudi_table_name_history` with this location `gs://bucket_name/table_name`, don't use
+   this table to query the data, this table will have duplicate records since it scans all the versions of parquet files
+   in the table/partition folders.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_history`
+WITH 
+  PARTITION COLUMNS 
+  OPTIONS(
+    ignore_unknown_values=true, 
+    format="PARQUET", 
+    hive_partition_uri_prefix="gs://hudi_datasets/bq_demo_partitioned_cow/",
+    uris=["gs://hudi_snowflake/bq_demo_partitioned_cow/dt=*"]
+  );
+```
+
+5. Create a BQ view with the same hudi table name with this query, this view you created has the data from the Hudi
+   table without any duplicates, you can use that table to query the data.
+
+```
+CREATE VIEW `my-first-project.dwh.bq_demo_partitioned_cow` AS 
+  SELECT
+  *
+  FROM
+  `my-first-project.dwh.bq_demo_partitioned_cow_history`
+  WHERE
+  _hoodie_file_name IN (
+    SELECT 
+      filename 
+    FROM
+      `my-first-project.dwh.bq_demo_partitioned_cow_manifest`
+  );
+```
+
+BigQuerySync tool will
+use [HoodieTableMetaClient](https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java)
+methods to get the list of latest set of parquet data files to generate the manifest csv file, then will invoke
+the [BigQuery Java Client](https://github.com/googleapis/java-bigquery/blob/main/samples/snippets/src/main/java/com/example/bigquery/CreateTableExternalHivePartitioned.java)
+to create the manifest table, history table and hudi table views.
+
+**All the steps described here will be automated, all you have to do is to supply a bunch of configs to enable the
+BigQuery sync.**
+
+## Rollout/Adoption Plan
+
+There are no impacts to existing users since this is entirely a new feature to support a new use case hence there are no
+migrations/behavior changes required.
+
+After the BigQuery sync tool has been implemented, I will reach out to Uber's Hudi/BigQuery team to rollout this feature
+for their BigQuery ingestion service.
+
+## Test Plan
+
+This RFC aims to implement a new SyncTool to sync the Hudi table to BigQuery, to test this feature, there will be some
+test tables created and updated on to the BigQuery along with unit tests for the code. Since this is an entirely new
+feature, I am confident that this will not cause any regressions during and after roll out.
+
+## Future Plans
+
+After this feature has been rolled out, the same model can be applied to sync the Hudi tables to other external data

Review comment:
       Yes, you are correct, manifest would be helpful for integrating both BigQuery and Snowflake, but if it is just for BigQuery, we can update their native table and use that table for filtering out the latest snapshot parquet files.

##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,165 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi file format, but
+it has support for the Parquet file format. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the

Review comment:
       Yes, we already brought this up with the BigQuery team, but they were not open to changing their design.

##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,165 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi file format, but
+it has support for the Parquet file format. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the
+schema by enabling this flag:
+
+```
+hoodie.datasource.write.drop.partition.columns = 'true'
+```
+
+2. As part of the BigQuerySync, the sync tool will generate/update the manifest files inside the .hoodie metadata files.
+   For tables which already exist, you can generate a manifest file for the Hudi table which has the list of the latest
+   snapshot parquet file names in a CSV format with only one column the file name. The location of the manifest file
+   will be on the .hoodie metadata folder (`gs://bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv`)
+
+```
+// this command is coming soon.

Review comment:
       Yes, the alternative for this command could be a JAVA API to generate the manifest file, this is mainly useful for already existing hudi datasets.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022755412


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545) 
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086804302


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7785",
       "triggerID" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   * 72062cc4dda1aa5ddd19212960d9ade27214ca0e UNKNOWN
   * d3c8b412fb82ba727c424c312e3d43b4976699cd UNKNOWN
   * d392a93e4f21ff85cb4a57ad401971dd16a60541 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7785) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086793026


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086792216


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1020614643


   @vingov What are the current blockers for making this work end-end? just the `.hoodie_partition_metadata` filtering ? 
   
   @prashantwason shared an interesting idea to make that file an empty parquet file, and the contents of the current file put into the footers. We can provide an upgrade utility to do this as well in 0.11 and we should be good here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022660299


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   * 0ec28cc6531761ee4e10700940560129d9016bb4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542) 
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022658379


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   * 0ec28cc6531761ee4e10700940560129d9016bb4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022658379


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   * 0ec28cc6531761ee4e10700940560129d9016bb4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022695629


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0ec28cc6531761ee4e10700940560129d9016bb4 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542) 
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan merged pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
xushiyan merged pull request #4503:
URL: https://github.com/apache/hudi/pull/4503


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086796527


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   * 72062cc4dda1aa5ddd19212960d9ade27214ca0e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1057999837


   @vingov are we tracking this for 0.11 as planned? Could you please open a WIP PR when ready, so we can prioritize this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vingov commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vingov commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1060272575


   > > The next blocker would be finding a generic way to exclude .hoodie folder which works for any partitioned/non-partitioned table.
   > 
   > Love to understand this. Could you list out the bandaids the PoC needs atm?
   
   These are the current blockers:
   
   1. Exclude .hoodie folder while creating the BQ table/view.
   2. Convert .hoodie_partition_metadata to an empty parquet file and have the metadata in the footer.
   3. Generate the list of the latest snapshot parquet files in a CSV format.
   4. When you drop the partition column from the parquet file (hoodie.datasource.write.drop.partition.columns=true), it complains with the latest hudi version, we need to fix that issue. 
   
   I will track all the blockers in the Jira and update the main umbrella epic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vingov commented on a change in pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vingov commented on a change in pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#discussion_r820421418



##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,166 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi, but it has
+support for the Parquet and other formats. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the
+schema by enabling this flag:
+
+```
+hoodie.datasource.write.drop.partition.columns = 'true'
+```
+
+2. As part of the BigQuerySync, the sync tool will generate/update the manifest files inside the .hoodie metadata files.
+   For tables which already exist, you can generate a manifest file for the Hudi table which has the list of the latest
+   snapshot parquet file names in a CSV format with only one column the file name. The location of the manifest file
+   will be on the .hoodie metadata folder (`gs://bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv`)
+
+```
+// this command is coming soon.
+// the alternative for this command could be a JAVA API to generate the manifest.
+GENERATE symlink_format_manifest FOR TABLE dwh.bq_demo_partitioned_cow;
+```
+
+3. Create a BQ table named `hudi_table_name_manifest` with only one column filename with this location gs:
+   //bucket_name/table_name/.hoodie/manifest/latest_snapshot_files.csv.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_manifest`
+(
+  filename STRING
+)
+OPTIONS(
+  format="CSV",
+  uris=["gs://hudi_datasets/bq_demo_partitioned_cow/.hoodie/manifest/latest_snapshot_files.csv"]
+);
+```
+
+4. Create another BQ table named `hudi_table_name_history` with this location `gs://bucket_name/table_name`, don't use
+   this table to query the data, this table will have duplicate records since it scans all the versions of parquet files
+   in the table/partition folders.
+
+```
+CREATE EXTERNAL TABLE `my-first-project.dwh.bq_demo_partitioned_cow_history`
+WITH 
+  PARTITION COLUMNS 
+  OPTIONS(
+    ignore_unknown_values=true, 
+    format="PARQUET", 
+    hive_partition_uri_prefix="gs://hudi_datasets/bq_demo_partitioned_cow/",
+    uris=["gs://hudi_snowflake/bq_demo_partitioned_cow/dt=*"]
+  );
+```
+
+5. Create a BQ view with the same hudi table name with this query, this view you created has the data from the Hudi
+   table without any duplicates, you can use that table to query the data.

Review comment:
       Yes, it supports only COW tables, in the docs, I will add this limitation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vingov commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vingov commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022647337


   Yes @vinothchandar, I spoke with @prashantwason and created a follow up ticket ([HUDI-3290](https://issues.apache.org/jira/browse/HUDI-3290)) to solve the `.hoodie_partition_metadata` problem.
   
   The next blocker would be finding a generic way to exclude `.hoodie` folder which works for any partitioned/non-partitioned table.
   
   I've started on the implementation of the BigQuery sync code, the PR should be available for review soon.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022662307


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0ec28cc6531761ee4e10700940560129d9016bb4 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542) 
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vingov commented on a change in pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vingov commented on a change in pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#discussion_r793236316



##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,165 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi file format, but

Review comment:
       Yep, my bad, changed the language.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022733967


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022780659


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022662307


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0ec28cc6531761ee4e10700940560129d9016bb4 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542) 
   * 4f51ef51999c35d29f08fcefc5ae6af72b35eef1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1022656270


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   * 0ec28cc6531761ee4e10700940560129d9016bb4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086792216


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0cbc964f6c87d2b66a585b358a7dc9dab28f0418 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549) 
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086792623


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#discussion_r841202661



##########
File path: rfc/rfc-34/rfc-34.md
##########
@@ -0,0 +1,165 @@
+# Hudi BigQuery Integration
+
+## Abstract
+
+BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run
+analytics over vast amounts of data in near real time. BigQuery
+currently [doesn’t support](https://cloud.google.com/bigquery/external-data-cloud-storage) Apache Hudi file format, but
+it has support for the Parquet file format. The proposal is to implement a BigQuerySync similar to HiveSync to sync the
+Hudi table as the BigQuery External Parquet table, so that users can query the Hudi tables using BigQuery. Uber is
+already syncing some of its Hudi tables to BigQuery data mart this will help them to write, sync and query.
+
+## Background
+
+Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities
+are implemented on top of such organization (i.e how data is written). In turn, query types define how the underlying
+data is exposed to the queries (i.e how data is read).
+
+Hudi supports the following table types:
+
+* [Copy On Write](https://hudi.apache.org/docs/table_types#copy-on-write-table): Stores data using exclusively columnar
+  file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
+* [Merge On Read](https://hudi.apache.org/docs/table_types#merge-on-read-table): Stores data using a combination of
+  columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to
+  produce new versions of columnar files synchronously or asynchronously.
+
+Hudi maintains multiple versions of the Parquet files and tracks the latest version using Hudi metadata (Cow), since
+BigQuery doesn’t support Hudi yet, when you sync the Hudi’s parquet files to BigQuery and query it without Hudi’s
+metadata layer, it will query all the versions of the parquet files which might cause duplicate rows.
+
+To avoid the above scenario, this proposal is to implement a BigQuery sync tool which will use the Hudi metadata to know
+which files are latest and filter only the latest version of parquet files to BigQuery external table so that users can
+query the Hudi tables without any duplicate records.
+
+## Implementation
+
+This new feature will implement
+the [AbstractSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncTool.java)
+similar to
+the [HiveSyncTool](https://github.com/apache/hudi/blob/master/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
+named BigQuerySyncTool with sync methods for CoW tables. The sync implementation will identify the latest parquet files
+for each .commit file and keep these manifests synced with the BigQuery manifest table. Spark datasource & DeltaStreamer
+can already take a list of such classes to keep these manifests synced.
+
+###           
+
+![alt_text](big-query-arch.png "Big Query integration architecture.")
+
+To avoid duplicate records on the Hudi CoW table, we need to generate the list of latest snapshot files and create a BQ
+table for it, then use that table to filter the duplicate records from the history table.
+
+### Steps to create Hudi table on BigQuery
+
+1. Let's say you have a Hudi table data on google cloud storage (GCS).
+
+ ```
+CREATE TABLE dwh.bq_demo_partitioned_cow (
+  id bigint, 
+  name string,
+  price double,
+  ts bigint,
+  dt string
+) 
+using hudi 
+partitioned by (dt)
+options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts',
+  hoodie.datasource.write.drop.partition.columns = 'true'
+)
+location 'gs://hudi_datasets/bq_demo_partitioned_cow/';
+```
+
+BigQuery doesn't accept the partition column in the parquet schema, hence we need to drop the partition columns from the

Review comment:
       This actually prevents deltastreamer from using bq sync, as the drop column config does not work with delta streamer, does it? @vingov 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vingov commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
vingov commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086787943


   @xushiyan or @vinothchandar - Since the implementation PR has been merged, can someone merge this PR as well? thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086796882


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   * 72062cc4dda1aa5ddd19212960d9ade27214ca0e UNKNOWN
   * d3c8b412fb82ba727c424c312e3d43b4976699cd UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086796527


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   * 72062cc4dda1aa5ddd19212960d9ade27214ca0e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-3534] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1086800007


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5542",
       "triggerID" : "0ec28cc6531761ee4e10700940560129d9016bb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5545",
       "triggerID" : "4f51ef51999c35d29f08fcefc5ae6af72b35eef1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5549",
       "triggerID" : "0cbc964f6c87d2b66a585b358a7dc9dab28f0418",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783",
       "triggerID" : "76ac61d8af933353f836bf0cbb399e68f44e6c2c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1efd64a432a2adeda514ee4ab279fdf97b990c7b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "72062cc4dda1aa5ddd19212960d9ade27214ca0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3c8b412fb82ba727c424c312e3d43b4976699cd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7785",
       "triggerID" : "d392a93e4f21ff85cb4a57ad401971dd16a60541",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76ac61d8af933353f836bf0cbb399e68f44e6c2c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7783) 
   * 2f4a63ce2c4e8dbb6c7d931476c7c38c1b9151a6 UNKNOWN
   * 1efd64a432a2adeda514ee4ab279fdf97b990c7b UNKNOWN
   * 72062cc4dda1aa5ddd19212960d9ade27214ca0e UNKNOWN
   * d3c8b412fb82ba727c424c312e3d43b4976699cd UNKNOWN
   * d392a93e4f21ff85cb4a57ad401971dd16a60541 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7785) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4503:
URL: https://github.com/apache/hudi/pull/4503#issuecomment-1004558109


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878",
       "triggerID" : "6c79ce6e2ff6eec244d11d35f3556fd5356803cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6c79ce6e2ff6eec244d11d35f3556fd5356803cc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4878) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org