You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "xushiyan (via GitHub)" <gi...@apache.org> on 2023/02/27 05:18:26 UTC

[GitHub] [hudi] xushiyan opened a new pull request, #8053: [HUDI-5853] Add infer functions to BQ sync configs

xushiyan opened a new pull request, #8053:
URL: https://github.com/apache/hudi/pull/8053

   ### Change Logs
   
   Infer these BQ sync configs
   
   - BIGQUERY_SYNC_TABLE_NAME
   - BIGQUERY_SYNC_SYNC_BASE_PATH
   - BIGQUERY_SYNC_USE_FILE_LISTING_FROM_METADATA
   - BIGQUERY_SYNC_ASSUME_DATE_PARTITIONING
   
   These were created when BQ sync was made. They are redundant and should be removed eventually. The infer functions are to keep things BWC and allow people to omit these configs.
   
   ### Impact
   
   Reduce configs for running BQ sync.
   
   ### Risk level
   
   Low.
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #8053:
URL: https://github.com/apache/hudi/pull/8053#discussion_r1119292413


##########
hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java:
##########
@@ -75,26 +87,32 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable
   public static final ConfigProperty<String> BIGQUERY_SYNC_SYNC_BASE_PATH = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.base_path")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(META_SYNC_BASE_PATH)))
       .withDocumentation("Base path of the hoodie table to sync");
 
   public static final ConfigProperty<String> BIGQUERY_SYNC_PARTITION_FIELDS = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.partition_fields")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
+          .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME))))
       .withDocumentation("Comma-delimited partition fields. Default to non-partitioned.");
 
   public static final ConfigProperty<Boolean> BIGQUERY_SYNC_USE_FILE_LISTING_FROM_METADATA = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.use_file_listing_from_metadata")
-      .defaultValue(false)
+      .defaultValue(DEFAULT_METADATA_ENABLE_FOR_READERS)
+      .withInferFunction(cfg -> Option.of(cfg.getBooleanOrDefault(HoodieMetadataConfig.ENABLE, DEFAULT_METADATA_ENABLE_FOR_READERS)))

Review Comment:
   we expect users to explicitly set by the user. guess, we don't have much option here. we can go w/ this for now. for now, we are good to go. 



##########
hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java:
##########
@@ -75,26 +87,32 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable
   public static final ConfigProperty<String> BIGQUERY_SYNC_SYNC_BASE_PATH = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.base_path")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(META_SYNC_BASE_PATH)))
       .withDocumentation("Base path of the hoodie table to sync");
 
   public static final ConfigProperty<String> BIGQUERY_SYNC_PARTITION_FIELDS = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.partition_fields")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))

Review Comment:
   should not be an issue. we are good



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8053:
URL: https://github.com/apache/hudi/pull/8053#issuecomment-1447426518

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15424",
       "triggerID" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e9403ca1169e910412d20bfa88ad864a48daaa9c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15448",
       "triggerID" : "e9403ca1169e910412d20bfa88ad864a48daaa9c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e9403ca1169e910412d20bfa88ad864a48daaa9c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15448) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8053:
URL: https://github.com/apache/hudi/pull/8053#issuecomment-1446064612

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15424",
       "triggerID" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1bda73008e50ab3d4fcfb34c2208d8df3b8d7318 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15424) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #8053:
URL: https://github.com/apache/hudi/pull/8053#discussion_r1118294290


##########
hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java:
##########
@@ -75,21 +85,26 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable
   public static final ConfigProperty<String> BIGQUERY_SYNC_SYNC_BASE_PATH = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.base_path")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(META_SYNC_BASE_PATH)))
       .withDocumentation("Base path of the hoodie table to sync");
 
   public static final ConfigProperty<String> BIGQUERY_SYNC_PARTITION_FIELDS = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.partition_fields")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
+          .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME))))
       .withDocumentation("Comma-delimited partition fields. Default to non-partitioned.");
 
   public static final ConfigProperty<Boolean> BIGQUERY_SYNC_USE_FILE_LISTING_FROM_METADATA = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.use_file_listing_from_metadata")
-      .defaultValue(false)
+      .defaultValue(DEFAULT_METADATA_ENABLE_FOR_READERS)
+      .withInferFunction(cfg -> Option.of(cfg.getBooleanOrDefault(HoodieMetadataConfig.ENABLE, DEFAULT_METADATA_ENABLE_FOR_READERS)))
       .withDocumentation("Fetch file listing from Hudi's metadata");
 
-  public static final ConfigProperty<Boolean> BIGQUERY_SYNC_ASSUME_DATE_PARTITIONING = ConfigProperty
+  public static final ConfigProperty<String> BIGQUERY_SYNC_ASSUME_DATE_PARTITIONING = ConfigProperty

Review Comment:
   this is compatible. changing to string is to align with existing root config `HoodieMetadataConfig.ASSUME_DATE_PARTITIONING`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8053:
URL: https://github.com/apache/hudi/pull/8053#issuecomment-1445733667

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1bda73008e50ab3d4fcfb34c2208d8df3b8d7318 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8053:
URL: https://github.com/apache/hudi/pull/8053#issuecomment-1445738498

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15424",
       "triggerID" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1bda73008e50ab3d4fcfb34c2208d8df3b8d7318 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15424) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan merged PR #8053:
URL: https://github.com/apache/hudi/pull/8053


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #8053:
URL: https://github.com/apache/hudi/pull/8053#discussion_r1119186983


##########
hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java:
##########
@@ -75,26 +87,32 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable
   public static final ConfigProperty<String> BIGQUERY_SYNC_SYNC_BASE_PATH = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.base_path")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(META_SYNC_BASE_PATH)))
       .withDocumentation("Base path of the hoodie table to sync");
 
   public static final ConfigProperty<String> BIGQUERY_SYNC_PARTITION_FIELDS = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.partition_fields")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
+          .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME))))
       .withDocumentation("Comma-delimited partition fields. Default to non-partitioned.");
 
   public static final ConfigProperty<Boolean> BIGQUERY_SYNC_USE_FILE_LISTING_FROM_METADATA = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.use_file_listing_from_metadata")
-      .defaultValue(false)
+      .defaultValue(DEFAULT_METADATA_ENABLE_FOR_READERS)
+      .withInferFunction(cfg -> Option.of(cfg.getBooleanOrDefault(HoodieMetadataConfig.ENABLE, DEFAULT_METADATA_ENABLE_FOR_READERS)))

Review Comment:
   not sure if this is right. we are fetching from write config to deduce the read value for enabling metadata. 



##########
hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java:
##########
@@ -75,26 +87,32 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable
   public static final ConfigProperty<String> BIGQUERY_SYNC_SYNC_BASE_PATH = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.base_path")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(META_SYNC_BASE_PATH)))
       .withDocumentation("Base path of the hoodie table to sync");
 
   public static final ConfigProperty<String> BIGQUERY_SYNC_PARTITION_FIELDS = ConfigProperty
       .key("hoodie.gcp.bigquery.sync.partition_fields")
       .noDefaultValue()
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))

Review Comment:
   Incase of custom key generator, I guess format of tableConfig.PARTITION_FIELDS and KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME might differ. 
   for eg
   simple:col1
   vs col1
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8053:
URL: https://github.com/apache/hudi/pull/8053#issuecomment-1447003474

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15424",
       "triggerID" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e9403ca1169e910412d20bfa88ad864a48daaa9c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15448",
       "triggerID" : "e9403ca1169e910412d20bfa88ad864a48daaa9c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1bda73008e50ab3d4fcfb34c2208d8df3b8d7318 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15424) 
   * e9403ca1169e910412d20bfa88ad864a48daaa9c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15448) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8053: [HUDI-5853] Add infer functions to BQ sync configs

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8053:
URL: https://github.com/apache/hudi/pull/8053#issuecomment-1446919252

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15424",
       "triggerID" : "1bda73008e50ab3d4fcfb34c2208d8df3b8d7318",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e9403ca1169e910412d20bfa88ad864a48daaa9c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e9403ca1169e910412d20bfa88ad864a48daaa9c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1bda73008e50ab3d4fcfb34c2208d8df3b8d7318 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15424) 
   * e9403ca1169e910412d20bfa88ad864a48daaa9c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org