You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/09 10:00:29 UTC

[GitHub] [hudi] xicm opened a new pull request, #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

xicm opened a new pull request, #6899:
URL: https://github.com/apache/hudi/pull/6899

   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1272546356

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089",
       "triggerID" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b4e7ab1f3e92f0076a43953711a466f954961ef Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
xicm commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1272781782

   > The intention is to always infer `META_SYNC_PARTITION_FIELDS` and `META_SYNC_PARTITION_EXTRACTOR_CLASS` from table config or keygen options. So we should keep the inference consistent. If you manually overwrite `META_SYNC_PARTITION_FIELDS`, you want to overwrite `META_SYNC_PARTITION_EXTRACTOR_CLASS` as well.
   
   Hi @xushiyan , I want this inference to also work with META_SYNC_PARTITION_FIELDS, is it necessary?
   
   And another question, If HoodieTableConfig.PARTITION_FIELDS or KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME is null, META_SYNC_PARTITION_EXTRACTOR_CLASS will be default value MultiPartKeysValueExtractor. In general, partition field is null, the table may be a non-parition table, is the default extractor class reasonablle?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1272785880

   > Hi @xushiyan , I want this inference to also work with META_SYNC_PARTITION_FIELDS, is it necessary?
   
   You might add META_SYNC_PARTITION_FIELDS as the 3rd config to infer after the table config and keygen option, to keep the inference consistent. But what are the cases that the hudi table does not have the partition table configs, and you can't specify the keygen options like for the writer configs ?
   
   > In general, partition field is null, the table may be a non-parition table, is the default extractor class reasonablle?
   
   MultiPartKeysValueExtractor also handles non-partitioned cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work with META_SYNC_PARTITION_FIELDS

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1283847838

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089",
       "triggerID" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12101",
       "triggerID" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "triggerType" : "PUSH"
     }, {
       "hash" : "87b19d8977713aef63b081813c8439dc113da637",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12338",
       "triggerID" : "87b19d8977713aef63b081813c8439dc113da637",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 34dc3ebd5324d1360333817beb251510ca301f95 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12101) 
   * 87b19d8977713aef63b081813c8439dc113da637 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12338) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
xicm commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1272805488

   ![image](https://user-images.githubusercontent.com/36392121/194803644-0164885a-d56f-4717-91f9-31cebea698f2.png)
   We also need to update the doc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on a diff in pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work with META_SYNC_PARTITION_FIELDS

Posted by GitBox <gi...@apache.org>.
xicm commented on code in PR #6899:
URL: https://github.com/apache/hudi/pull/6899#discussion_r999298025


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")

Review Comment:
   OK, thanks for your explanation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work with META_SYNC_PARTITION_FIELDS

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1284350904

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089",
       "triggerID" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12101",
       "triggerID" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "triggerType" : "PUSH"
     }, {
       "hash" : "87b19d8977713aef63b081813c8439dc113da637",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12338",
       "triggerID" : "87b19d8977713aef63b081813c8439dc113da637",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 87b19d8977713aef63b081813c8439dc113da637 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12338) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on a diff in pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
xicm commented on code in PR #6899:
URL: https://github.com/apache/hudi/pull/6899#discussion_r991045336


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")
       .withInferFunction(cfg -> {
-        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
-            .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)));
-        if (!partitionFieldsOpt.isPresent()) {
+        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(META_SYNC_PARTITION_FIELDS));

Review Comment:
   META_SYNC_PARTITION_FIELDS is inferred from HoodieTableConfig.PARTITION_FIELDS and KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME, so I think it is ok to infrer META_SYNC_PARTITION_EXTRACTOR_CLASS from META_SYNC_PARTITION_FIELDS.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1272991886

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089",
       "triggerID" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b4e7ab1f3e92f0076a43953711a466f954961ef Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089) 
   * 34dc3ebd5324d1360333817beb251510ca301f95 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on a diff in pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
xicm commented on code in PR #6899:
URL: https://github.com/apache/hudi/pull/6899#discussion_r991048467


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")

Review Comment:
   The default value of META_SYNC_PARTITION_FIELDS is "",  I think change the default value of META_SYNC_PARTITION_EXTRACTOR_CLASS to  NonPartitionedExtractor  is more semantic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work with META_SYNC_PARTITION_FIELDS

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1283841116

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089",
       "triggerID" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12101",
       "triggerID" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "triggerType" : "PUSH"
     }, {
       "hash" : "87b19d8977713aef63b081813c8439dc113da637",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "87b19d8977713aef63b081813c8439dc113da637",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 34dc3ebd5324d1360333817beb251510ca301f95 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12101) 
   * 87b19d8977713aef63b081813c8439dc113da637 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work with META_SYNC_PARTITION_FIELDS

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #6899:
URL: https://github.com/apache/hudi/pull/6899#discussion_r999206235


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")
       .withInferFunction(cfg -> {
-        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
-            .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)));
-        if (!partitionFieldsOpt.isPresent()) {
+        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(META_SYNC_PARTITION_FIELDS));

Review Comment:
   i think it does make sense to infer from META_SYNC_PARTITION_FIELDS as users can manual set it, which should align with the partition extractor. so can you make it the 1st, then fallback to table, and then key gen



##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")
       .withInferFunction(cfg -> {
-        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
-            .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)));
-        if (!partitionFieldsOpt.isPresent()) {
+        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(META_SYNC_PARTITION_FIELDS));

Review Comment:
   the problem is when instantiating the hoodie sync config we cannot guarantee META_SYNC_PARTITION_FIELDS is inferred before META_SYNC_PARTITION_EXTRACTOR_CLASS. so we want to still check the table and key gen configs.



##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")

Review Comment:
   @xicm the convention is we don't change default value unless there is a strong reason. we changed slash encoded extractor to multi-part extractor in 0.12.0 for handling usecases better. we don't want to change so often. besides, `MultiPartKeysValueExtractor` handles non partition case right? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on a diff in pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work with META_SYNC_PARTITION_FIELDS

Posted by GitBox <gi...@apache.org>.
xicm commented on code in PR #6899:
URL: https://github.com/apache/hudi/pull/6899#discussion_r999298396


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")
       .withInferFunction(cfg -> {
-        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
-            .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)));
-        if (!partitionFieldsOpt.isPresent()) {
+        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(META_SYNC_PARTITION_FIELDS));

Review Comment:
   Ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1272506976

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089",
       "triggerID" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b4e7ab1f3e92f0076a43953711a466f954961ef Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1272998450

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089",
       "triggerID" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12101",
       "triggerID" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b4e7ab1f3e92f0076a43953711a466f954961ef Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089) 
   * 34dc3ebd5324d1360333817beb251510ca301f95 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12101) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
xicm commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1272804571

   > @xicm You might add META_SYNC_PARTITION_FIELDS as the 3rd config to infer after the table config and keygen option, to keep the inference consistent. But what are the cases that the hudi table does not have the partition table configs, and you can't specify the keygen options like for the writer configs ? in another word, why would `META_SYNC_PARTITION_FIELDS` be different from the table configs or keygen options (writer configs) ? we should strive to make these consistent. that's why we introduce the inference and try to avoid the needs of manual overwrite.
   
   I find the problem when I run hive sync with `./run_sync_tool.sh  --sync-mode hms --user hive --partitioned-by partitionpath --base-path /tmp/hoodie/xicm_test --database default --table xicm_test`.  I get an Exception `org.apache.hadoop.hive.metastore.api.MetaException: Invalid partition key & values; keys [partitionpath, ], values [2020, 01, 01, ]`. The partition field is "2020/01/01". 
   
   Add parameter  `--partition-value-extractor SinglePartPartitionValueExtractor`,  the sync is successful. 
   
   Then I add some logs to see why the inference doesn't work, I find HoodieTableConfig.PARTITION_FIELDS and KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME are null. And we pass --partition-value-extractor to META_SYNC_PARTITION_FIELDS, META_SYNC_PARTITION_EXTRACTOR_CLASS can't be inferred. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1272506022

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b4e7ab1f3e92f0076a43953711a466f954961ef UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6899:
URL: https://github.com/apache/hudi/pull/6899#issuecomment-1273247927

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12089",
       "triggerID" : "9b4e7ab1f3e92f0076a43953711a466f954961ef",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12101",
       "triggerID" : "34dc3ebd5324d1360333817beb251510ca301f95",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 34dc3ebd5324d1360333817beb251510ca301f95 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12101) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on a diff in pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work

Posted by GitBox <gi...@apache.org>.
xicm commented on code in PR #6899:
URL: https://github.com/apache/hudi/pull/6899#discussion_r991048467


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")

Review Comment:
   The default value of META_SYNC_PARTITION_FIELDS is "",  I think it is more semantic to change the default value of META_SYNC_PARTITION_EXTRACTOR_CLASS to  NonPartitionedExtractor  .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on a diff in pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work with META_SYNC_PARTITION_FIELDS

Posted by GitBox <gi...@apache.org>.
xicm commented on code in PR #6899:
URL: https://github.com/apache/hudi/pull/6899#discussion_r999297155


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -96,8 +96,13 @@ public class HoodieSyncConfig extends HoodieConfig {
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
       .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
       .withInferFunction(cfg -> {
-        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
-            .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)));
+        Option<String> partitionFieldsOpt;
+        if (StringUtils.nonEmpty(cfg.getString(META_SYNC_PARTITION_FIELDS))) {
+          partitionFieldsOpt = Option.ofNullable(cfg.getString(META_SYNC_PARTITION_FIELDS));

Review Comment:
   META_SYNC_PARTITION_FIELDS has default value,  the inferring can be cut off. So if META_SYNC_PARTITION_FIELDS is not empty,  we will infer from it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work with META_SYNC_PARTITION_FIELDS

Posted by GitBox <gi...@apache.org>.
xushiyan merged PR #6899:
URL: https://github.com/apache/hudi/pull/6899


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org