You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/07 09:20:38 UTC

[GitHub] [hudi] prashantwason opened a new pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

prashantwason opened a new pull request #3427:
URL: https://github.com/apache/hudi/pull/3427


   
   ## What is the purpose of the pull request
   
   Metadata Table should only be synced from a single pipeline to prevent conflicts.
   
   ## Brief change log
   
   Added a config which can be used to disable the syncing of metadata table.
   
   ## Verify this pull request
   
   Unit test has been added.
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r688054377



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -141,6 +148,10 @@ public boolean useFileListingMetadata() {
     return getBoolean(METADATA_ENABLE_PROP);

Review comment:
       Has been renamed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r687991692



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -44,6 +44,13 @@
       .sinceVersion("0.7.0")
       .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings");
 
+  // Enable syncing the Metadata Table
+  public static final ConfigProperty<Boolean> METADATA_SYNC_ENABLE_PROP = ConfigProperty

Review comment:
       You are right. 
   
   I feel the correct way would be have the transaction within the Metadata Table itself so we dont have to worry about which codepath it is called from. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-897818356


   @vinothchandar Yes, lets go with sync only for commits. Other enhancements can come later. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r684696928



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -44,6 +44,13 @@
       .sinceVersion("0.7.0")
       .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings");
 
+  // Enable syncing the Metadata Table
+  public static final ConfigProperty<Boolean> METADATA_SYNC_ENABLE_PROP = ConfigProperty

Review comment:
       here what's the difference between config `hoodie.metadata.enable`, can we just reuse the config?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r687252668



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -44,6 +44,13 @@
       .sinceVersion("0.7.0")
       .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings");
 
+  // Enable syncing the Metadata Table
+  public static final ConfigProperty<Boolean> METADATA_SYNC_ENABLE_PROP = ConfigProperty

Review comment:
       @leesf the current problem is that metadata table is not safe when written from multiple pipelines together. So we need to keep it enabled in all pipelines but only have one pipeline write to it. 
   
   Two alternate suggestion:
   1. Change the name of this config :
   The writing-to-metadata-table is being called the "sync" here. We can change it to read-only mode?
   hoodie.metadata.readonly=true?
   
   2. Introduce metadata table mode instead of enable disable:
   hoodie.metadata.mode=[disabled,readwrite,readonly].    readwrite and readonly imply enabled=true and users only provide the single config mode.
   
   I also do not want another config to confuse users. The other option is to enable multi-writer mode for metadata table. I have that implemented as part of another PR (metadata table next version): https://github.com/apache/hudi/pull/3426/commits/f417390aed56c05dbaedeac6ecf47294996bb591
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r686385601



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -44,6 +44,13 @@
       .sinceVersion("0.7.0")
       .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings");
 
+  // Enable syncing the Metadata Table
+  public static final ConfigProperty<Boolean> METADATA_SYNC_ENABLE_PROP = ConfigProperty

Review comment:
       that just controls whether hudi writer should populate the metadata table at all. This PR is introducing a writer level config which can be used to say prevent compaction/clustering etc from syncing to metadata table. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-897890624


   I confirmed clean does not go through the commitStats path. changes should be good as-is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r688073272



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -44,6 +44,13 @@
       .sinceVersion("0.7.0")
       .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings");
 
+  // Enable syncing the Metadata Table
+  public static final ConfigProperty<Boolean> METADATA_SYNC_ENABLE_PROP = ConfigProperty

Review comment:
       given this is last min for RC cut, I suggest we keep this for now. the checks I added should alleviate the case for sigle writer, with async cleaning, compaction, clustering already. 
   
   We can pour our energy into sync design and may be even remve this config.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r687991881



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -403,7 +403,9 @@ protected void preWrite(String instantTime, WriteOperationType writeOperationTyp
         .isPresent()
         ? Option.of(lastCompletedTxnAndMetadata.get().getLeft()) : Option.empty());
     try {
-      syncTableMetadata();
+      if (writeOperationType != WriteOperationType.CLUSTER && writeOperationType != WriteOperationType.COMPACT) {

Review comment:
       Clean does not go thru preWrite/postCommit IIRC. So we should already be ok. I ll double check this and also callers of commitStats




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r687997217



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -403,7 +403,9 @@ protected void preWrite(String instantTime, WriteOperationType writeOperationTyp
         .isPresent()
         ? Option.of(lastCompletedTxnAndMetadata.get().getLeft()) : Option.empty());
     try {
-      syncTableMetadata();
+      if (writeOperationType != WriteOperationType.CLUSTER && writeOperationType != WriteOperationType.COMPACT) {

Review comment:
       I ll reverse the check, to be conservative. Good callout




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r686385905



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -141,6 +148,10 @@ public boolean useFileListingMetadata() {
     return getBoolean(METADATA_ENABLE_PROP);

Review comment:
       ideally we should rename this method

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -192,6 +192,54 @@ public void testMetadataTableBootstrap() throws Exception {
     }
   }
 
+  /**
+   * Test enable/disable sync via the config.
+   */
+  @Test
+  public void testSyncConfig() throws Exception {
+    init(HoodieTableType.COPY_ON_WRITE);
+    HoodieSparkEngineContext engineContext = new HoodieSparkEngineContext(jsc);
+
+    // Create the metadata table
+    String firstCommitTime = HoodieActiveTimeline.createNewInstantTime();
+    try (SparkRDDWriteClient client = new SparkRDDWriteClient(engineContext, getWriteConfig(true, true), true)) {
+      client.startCommitWithTime(firstCommitTime);
+      client.insert(jsc.parallelize(dataGen.generateInserts(firstCommitTime, 2)), firstCommitTime);
+      client.syncTableMetadata();
+      assertTrue(fs.exists(new Path(metadataTableBasePath)));
+      validateMetadata(client);
+    }
+
+    // If sync is disabled, the table will not sync
+    HoodieWriteConfig config = getWriteConfigBuilder(true, true, false)
+        .withMetadataConfig(HoodieMetadataConfig.newBuilder()
+            .enable(true).enableMetrics(false).enableSync(false).build()).build();
+    final String metadataTableMetaPath = metadataTableBasePath + Path.SEPARATOR + HoodieTableMetaClient.METAFOLDER_NAME;
+    String secondCommitTime = HoodieActiveTimeline.createNewInstantTime();
+    try (SparkRDDWriteClient client = new SparkRDDWriteClient(engineContext, config, true)) {
+      client.startCommitWithTime(secondCommitTime);
+      client.insert(jsc.parallelize(dataGen.generateInserts(secondCommitTime, 2)), secondCommitTime);

Review comment:
       wondering if theere is a lighter weight way of testing this, without incurring the full write path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5f29c5f97729319bc31f550b91834533326a454f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1692",
       "triggerID" : "5f29c5f97729319bc31f550b91834533326a454f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462) 
   * 5f29c5f97729319bc31f550b91834533326a454f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1692) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-897222618


   To summarize:
   
   1. Async [clean, compact, cluster] - these are in the same process but run parallel in thread
       - We can take a lock in syncMetadata()
       - We can sync only as part of commit
       - We can depend on transaction support
   2. Parallel [clean, compact, cluster] - these are in different processes
       - We can use the proposed setting to disable sync in all but one pipeline
       - We can depend on transaction support
   
   Only transaction support fixed both the cases but this may not come by this weekend ([I have a patch for you to review](https://github.com/apache/hudi/commit/f417390aed56c05dbaedeac6ecf47294996bb591) though). In the short run, we need to:
   1. Enable the setting in this PR
   2. Handle the async operations as @vinothchandar is proposing.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-897245071


   Yes things will be smoother with multi table txn support (i think thats what you are referring to, the sync metadata table redesign). 
   
   For now, I feel we can outright not sync for cluster and compact, regardless of sync, async, parallel. We will be out of sync by some actions; which we can handle anyway. 
   
   I ll add that change to this PR if you also agree @prashantwason 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r687492662



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -44,6 +44,13 @@
       .sinceVersion("0.7.0")
       .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings");
 
+  // Enable syncing the Metadata Table
+  public static final ConfigProperty<Boolean> METADATA_SYNC_ENABLE_PROP = ConfigProperty

Review comment:
       I think enable multi-writer mode for metadata table is a better solution. And I go through the codebase of metadata sync, in `preWrite`, syncing metadata is in transaction(and should be ok in multi writing pipeline?) and I see the difference between setting the `hoodie.metadata.enable` to true and `hoodie.metadata.sync.enable` to true is creating `SparkHoodieBackedTableMetadataWriter`,  while in `postWrite`, it is not in transaction(should not ok in multi writing pipeline). Would you please correct me in some other aspect I missed? @prashantwason 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5f29c5f97729319bc31f550b91834533326a454f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1692",
       "triggerID" : "5f29c5f97729319bc31f550b91834533326a454f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "26bcff6f34aecbd09dea7e16330e5914703110e5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1695",
       "triggerID" : "26bcff6f34aecbd09dea7e16330e5914703110e5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26bcff6f34aecbd09dea7e16330e5914703110e5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1695) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
danny0405 commented on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-898101210


   +1 to @leesf , these two config options confuses us a lot, not to say the user.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5f29c5f97729319bc31f550b91834533326a454f",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1692",
       "triggerID" : "5f29c5f97729319bc31f550b91834533326a454f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "26bcff6f34aecbd09dea7e16330e5914703110e5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1695",
       "triggerID" : "26bcff6f34aecbd09dea7e16330e5914703110e5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5f29c5f97729319bc31f550b91834533326a454f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1692) 
   * 26bcff6f34aecbd09dea7e16330e5914703110e5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1695) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5f29c5f97729319bc31f550b91834533326a454f",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1692",
       "triggerID" : "5f29c5f97729319bc31f550b91834533326a454f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "26bcff6f34aecbd09dea7e16330e5914703110e5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "26bcff6f34aecbd09dea7e16330e5914703110e5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5f29c5f97729319bc31f550b91834533326a454f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1692) 
   * 26bcff6f34aecbd09dea7e16330e5914703110e5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5f29c5f97729319bc31f550b91834533326a454f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1692",
       "triggerID" : "5f29c5f97729319bc31f550b91834533326a454f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "26bcff6f34aecbd09dea7e16330e5914703110e5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "26bcff6f34aecbd09dea7e16330e5914703110e5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462) 
   * 5f29c5f97729319bc31f550b91834533326a454f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1692) 
   * 26bcff6f34aecbd09dea7e16330e5914703110e5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r688006417



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -403,7 +403,9 @@ protected void preWrite(String instantTime, WriteOperationType writeOperationTyp
         .isPresent()
         ? Option.of(lastCompletedTxnAndMetadata.get().getLeft()) : Option.empty());
     try {
-      syncTableMetadata();
+      if (writeOperationType != WriteOperationType.CLUSTER && writeOperationType != WriteOperationType.COMPACT) {

Review comment:
       So, the check is cumbersome to write otherwise. `writeOperationType` is upsert, delete, insert etc. not the action type. Will keep as-is 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r688054094



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -403,7 +403,9 @@ protected void preWrite(String instantTime, WriteOperationType writeOperationTyp
         .isPresent()
         ? Option.of(lastCompletedTxnAndMetadata.get().getLeft()) : Option.empty());
     try {
-      syncTableMetadata();
+      if (writeOperationType != WriteOperationType.CLUSTER && writeOperationType != WriteOperationType.COMPACT) {

Review comment:
       Ok. fine this way too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason merged pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
prashantwason merged pull request #3427:
URL: https://github.com/apache/hudi/pull/3427


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r687492662



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -44,6 +44,13 @@
       .sinceVersion("0.7.0")
       .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings");
 
+  // Enable syncing the Metadata Table
+  public static final ConfigProperty<Boolean> METADATA_SYNC_ENABLE_PROP = ConfigProperty

Review comment:
       I think enable multi-writer mode for metadata table is a better solution. And I go through the codebase of metadata sync, in `preWrite`, syncing metadata is in transaction(and should be ok in multi writing pipeline?) and I see the difference between setting the `hoodie.metadata.enable` to `false` and `hoodie.metadata.sync.enable` to `false` is creating `SparkHoodieBackedTableMetadataWriter`,  while in `postWrite`, it is not in transaction(should not ok in multi writing pipeline). Would you please correct me in some other aspect I missed? @prashantwason 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-894630091


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462",
       "triggerID" : "a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5f29c5f97729319bc31f550b91834533326a454f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5f29c5f97729319bc31f550b91834533326a454f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a675ac3b9d8a26f5dbd9aa9747ac157b2c3d5c8b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1462) 
   * 5f29c5f97729319bc31f550b91834533326a454f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r686948797



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -44,6 +44,13 @@
       .sinceVersion("0.7.0")
       .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings");
 
+  // Enable syncing the Metadata Table
+  public static final ConfigProperty<Boolean> METADATA_SYNC_ENABLE_PROP = ConfigProperty

Review comment:
       this config and that config `hoodie.metadata.enable` confused me, from the users end, when setting `hoodie.metadata.enable` to true, we expect hudi should create and sync to metadata table, here we introduce another similar config and would make confusion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-896974349


   @leesf has a point that we should differentiate between user and system level configs may be. 
   @prashantwason IIUC without multi writer turned on, we would expect the users to turn the sync off in all but one pipeline? 
   Otherwise, this config is mostly for the purposes of turning off sync from async clean, compact, clustering. 
   
   @leesf do you have any suggestions on the naming ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r687984903



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -403,7 +403,9 @@ protected void preWrite(String instantTime, WriteOperationType writeOperationTyp
         .isPresent()
         ? Option.of(lastCompletedTxnAndMetadata.get().getLeft()) : Option.empty());
     try {
-      syncTableMetadata();
+      if (writeOperationType != WriteOperationType.CLUSTER && writeOperationType != WriteOperationType.COMPACT) {

Review comment:
       What about AsyncClean? Should't we instead reverse the check:
   
   if (writeOperationType == COMMIT or writeOperationType == DELTACOMMIT) {
    ...}
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r684696928



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -44,6 +44,13 @@
       .sinceVersion("0.7.0")
       .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings");
 
+  // Enable syncing the Metadata Table
+  public static final ConfigProperty<Boolean> METADATA_SYNC_ENABLE_PROP = ConfigProperty

Review comment:
       here what's the difference between config `hoodie.metadata.enable`, can we reuse the config?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#issuecomment-897179236


   I am thinking about how to turn this off across different paths - spark datasource writer, deltastreamer, async compact,cleaner, clustering jobs. 
   
   From the code, I see the syncing happens on preWrite and postCommit. So the main issue is `WriteClient#compact()` and `WriteClient#cluster` calls invoking sync? Wondering if we should also check the write operation type and avoid syncing during these operations. then , as long as multi writer is turned on, it should work and be sane. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #3427: [HUDI-1292] Created a config to enable/disable syncing of metadata table.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #3427:
URL: https://github.com/apache/hudi/pull/3427#discussion_r686571228



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -141,6 +148,10 @@ public boolean useFileListingMetadata() {
     return getBoolean(METADATA_ENABLE_PROP);

Review comment:
       Yes, the name does not fit other such methods in HUDI.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org