You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/11 19:03:14 UTC

[GitHub] [hudi] codope opened a new pull request, #5293: [HUDI-3844] Update props in indexer based on table config

codope opened a new pull request, #5293:
URL: https://github.com/apache/hudi/pull/5293

   ## What is the purpose of the pull request
   
   HUDI-3844
   
   Without this patch, the indexer presumes only those metadata indexes are enabled which are passed to it in the props. Any other metadata index (other than FILES ofc) created by regular writers will get deleted. Indexer should not presume, instead reset its config if any MDT partition is available.
   
   ## Brief change log
   
     - Update props in indexer based on table config.
     - Guard MDT partitions init by async index config in metadata writer.
     - Add UT to cover the scenario.
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5293: [HUDI-3844] Update props in indexer based on table config

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5293:
URL: https://github.com/apache/hudi/pull/5293#issuecomment-1095539539

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "dfedac584daf3c648c1b77906145aa000cf0b2cc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7987",
       "triggerID" : "dfedac584daf3c648c1b77906145aa000cf0b2cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * dfedac584daf3c648c1b77906145aa000cf0b2cc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7987) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #5293: [HUDI-3844] Update props in indexer based on table config

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #5293:
URL: https://github.com/apache/hudi/pull/5293#discussion_r847898428


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -379,21 +379,24 @@ protected <T extends SpecificRecordBase> void initializeIfNeeded(HoodieTableMeta
     }
 
     // if metadata table exists, then check if any of the enabled partition types needs to be initialized
-    Set<String> inflightAndCompletedPartitions = getInflightAndCompletedMetadataPartitions(dataMetaClient.getTableConfig());
-    List<MetadataPartitionType> partitionsToInit = this.enabledPartitionTypes.stream()
-        .filter(p -> !inflightAndCompletedPartitions.contains(p.getPartitionPath()) && !MetadataPartitionType.FILES.equals(p))
-        .collect(Collectors.toList());
+    // NOTE: It needs to be guarded by async index config because if that is enabled then initialization happens through the index scheduler.
+    if (!dataWriteConfig.isMetadataAsyncIndex()) {

Review Comment:
   It should work. I mean the bloom_filters partition should be initialized and updated by the regular writer. I also tested the following scenario. Let me know if this is not what you meant.
   1. First deltacommit by regular writer with default configs (so only files partition created).
   2. indexing commit by indexer only for column_stats -> now there is both files and column_stats partition.
   3. Second deltacommit by regular writer with all indexes enabled (i.e. both column_stats and bloom_filters) -> now this initializes the bloom_filter partition as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5293: [HUDI-3844] Update props in indexer based on table config

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5293:
URL: https://github.com/apache/hudi/pull/5293#discussion_r847779941


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -379,21 +379,24 @@ protected <T extends SpecificRecordBase> void initializeIfNeeded(HoodieTableMeta
     }
 
     // if metadata table exists, then check if any of the enabled partition types needs to be initialized
-    Set<String> inflightAndCompletedPartitions = getInflightAndCompletedMetadataPartitions(dataMetaClient.getTableConfig());
-    List<MetadataPartitionType> partitionsToInit = this.enabledPartitionTypes.stream()
-        .filter(p -> !inflightAndCompletedPartitions.contains(p.getPartitionPath()) && !MetadataPartitionType.FILES.equals(p))
-        .collect(Collectors.toList());
+    // NOTE: It needs to be guarded by async index config because if that is enabled then initialization happens through the index scheduler.
+    if (!dataWriteConfig.isMetadataAsyncIndex()) {

Review Comment:
   will go ahead and land this. please put up a new patch if there is a gap.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #5293: [HUDI-3844] Update props in indexer based on table config

Posted by GitBox <gi...@apache.org>.
nsivabalan merged PR #5293:
URL: https://github.com/apache/hudi/pull/5293


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5293: [HUDI-3844] Update props in indexer based on table config

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5293:
URL: https://github.com/apache/hudi/pull/5293#issuecomment-1095447501

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "dfedac584daf3c648c1b77906145aa000cf0b2cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "dfedac584daf3c648c1b77906145aa000cf0b2cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * dfedac584daf3c648c1b77906145aa000cf0b2cc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5293: [HUDI-3844] Update props in indexer based on table config

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5293:
URL: https://github.com/apache/hudi/pull/5293#issuecomment-1095451354

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "dfedac584daf3c648c1b77906145aa000cf0b2cc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7987",
       "triggerID" : "dfedac584daf3c648c1b77906145aa000cf0b2cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * dfedac584daf3c648c1b77906145aa000cf0b2cc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7987) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5293: [HUDI-3844] Update props in indexer based on table config

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5293:
URL: https://github.com/apache/hudi/pull/5293#discussion_r847686429


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -379,21 +379,24 @@ protected <T extends SpecificRecordBase> void initializeIfNeeded(HoodieTableMeta
     }
 
     // if metadata table exists, then check if any of the enabled partition types needs to be initialized
-    Set<String> inflightAndCompletedPartitions = getInflightAndCompletedMetadataPartitions(dataMetaClient.getTableConfig());
-    List<MetadataPartitionType> partitionsToInit = this.enabledPartitionTypes.stream()
-        .filter(p -> !inflightAndCompletedPartitions.contains(p.getPartitionPath()) && !MetadataPartitionType.FILES.equals(p))
-        .collect(Collectors.toList());
+    // NOTE: It needs to be guarded by async index config because if that is enabled then initialization happens through the index scheduler.
+    if (!dataWriteConfig.isMetadataAsyncIndex()) {

Review Comment:
   except FILES partition, wrt every other MDT partition, either it will be synchronous or it will be be async. guess the fix ensures that right. 
   
   But tell me if this is feasible. not required to be fixed in this patch. 
   enable FILES via regular writer. 
   enable col stats via async indexer. 
   after sometime, enable bloom filter synchronously via regular writer? -> From the looks of the change, guess this is not feasible I guess. or am I missing something
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org