You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/26 20:23:52 UTC

[GitHub] [hudi] manojpec opened a new pull request #3871: [HUDI-2593] Disabling meta fields in the metadata table

manojpec opened a new pull request #3871:
URL: https://github.com/apache/hudi/pull/3871


   ## What is the purpose of the pull request
   
   Meta fields like _hoodie_record_key, _hoodie_commit_time are not needed for the metadata table. Disabling it.
   
   ## Brief change log
   
   HoodieWriterConfig used for HoodieBackedTableMetadataWriter is now built with meta fields property disabled.
   
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958638352


   But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958613451


   here is the actual reason we punted on it for now. 
   with virtual keys, we might want to regenerate record key and partition path from rest of the columns(payload). with metadata payload schema, we don't store the partition path in the payload. so, we can't regenerate the (record key, partition path) from payload w/o the actual meta fields. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958613451


   here is the actual reason we punted on it for now. 
   with virtual keys, we might want to regenerate record key and partition path from rest of the columns(payload). with metadata payload schema, we don't store the partition path in the payload. so, we can't regenerate the (record key, partition path) from payload w/o the actual meta fields. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958638352


   But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3871: [HUDI-2593] Disabling meta fields in the metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-952298117


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff9411c410ffbc5ce9b18b7ec97e226764062010",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff9411c410ffbc5ce9b18b7ec97e226764062010",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff9411c410ffbc5ce9b18b7ec97e226764062010 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
manojpec commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958713288


   > But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index.
   
   @prashantwason So far we only have `files` partition under metadata table. But, we are planning to bring in more partitions for storing other indices. So, the assumption of single partition for the metadata table will not hold good for long. Otherwise, removing 5 meta fields from each record by enabling virtual keys would definitely save a lot of space. We either have to improve the current metadata schema or infer the partition path from other cues for now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
manojpec commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-954362940


   Virtual keys cannot be enabled for Metadata table as the KeyGenerator needed for virtual key generation doesn't differentiate between the user data or metadata tables and hence it always looks for the meta fields which metadata tables don't have. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958613451






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-954973982


   @manojpec Can you please  give more details of why virtual keys dont work? Is this a limitation of the metadata table schema or of the way virtual key support is implemented?
   
   The metadata table records are very small in size so the overhead of the hudi metadata columns is very high. Hence, virtual keys support would greatly reduce the size of the metadata table. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958638352


   But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec closed pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
manojpec closed pull request #3871:
URL: https://github.com/apache/hudi/pull/3871


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
manojpec commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958713288


   > But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index.
   
   @prashantwason So far we only have `files` partition under metadata table. But, we are planning to bring in more partitions for storing other indices. So, the assumption of single partition for the metadata table will not hold good for long. Otherwise, removing 5 meta fields from each record by enabling virtual keys would definitely save a lot of space. We either have to improve the current metadata schema or infer the partition path from other cues for now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3871: [HUDI-2593] Disabling meta fields in the metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-952298117


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff9411c410ffbc5ce9b18b7ec97e226764062010",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2868",
       "triggerID" : "ff9411c410ffbc5ce9b18b7ec97e226764062010",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff9411c410ffbc5ce9b18b7ec97e226764062010 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2868) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958638352


   But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
manojpec commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-965899090


   @prashantwason WIP PR for adding virtual keys support for metadata table is at https://github.com/apache/hudi/pull/3968. Thanks for the patience. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3871: [HUDI-2593] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-952298117


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff9411c410ffbc5ce9b18b7ec97e226764062010",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2868",
       "triggerID" : "ff9411c410ffbc5ce9b18b7ec97e226764062010",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff9411c410ffbc5ce9b18b7ec97e226764062010 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2868) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
manojpec commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958713288


   > But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index.
   
   @prashantwason So far we only have `files` partition under metadata table. But, we are planning to bring in more partitions for storing other indices. So, the assumption of single partition for the metadata table will not hold good for long. Otherwise, removing 5 meta fields from each record by enabling virtual keys would definitely save a lot of space. We either have to improve the current metadata schema or infer the partition path from other cues for now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

Posted by GitBox <gi...@apache.org>.
manojpec commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958713288


   > But partition path for the metadata table are hardcoded. Can that be helpful? Removing the fields will save a lot of storage space from record level index.
   
   @prashantwason So far we only have `files` partition under metadata table. But, we are planning to bring in more partitions for storing other indices. So, the assumption of single partition for the metadata table will not hold good for long. Otherwise, removing 5 meta fields from each record by enabling virtual keys would definitely save a lot of space. We either have to improve the current metadata schema or infer the partition path from other cues for now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org