You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/05 17:09:14 UTC

[GitHub] [hudi] fengjian428 opened a new pull request, #5754: [WIP][HUDI-3730] Improve hive/meta sync class design and hierachies

fengjian428 opened a new pull request, #5754:
URL: https://github.com/apache/hudi/pull/5754

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   1. for the engines which need to use MetaSync, should implement SupportMetaSync on the sync classes, such as DeltaSync, KafkaConnectTransactionServices and etc. for example runMetaSync(); then will sync metadata by every SyncToolClasses which indicated in config
   2. redesign AbstractSyncClient and AbstractSyncTool, and add Catalog Interface. make the hierarchy of classes more clearly and more precisely
   3. unify the way to generate SyncConfig and the way to call SyncTool, remove some useless parameters
   4. User can choose whether sync partitions information to Catalog or not
   
   
   refer to [RFC55](https://github.com/apache/hudi/pull/5695)
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #5754: [HUDI-3730] Improve hive/meta sync class design and hierarchies

Posted by GitBox <gi...@apache.org>.
xushiyan merged PR #5754:
URL: https://github.com/apache/hudi/pull/5754


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5754: [HUDI-3730] Improve meta sync class design and hierarchies

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5754:
URL: https://github.com/apache/hudi/pull/5754#discussion_r895295348


##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java:
##########
@@ -24,79 +24,14 @@
 
 import com.beust.jcommander.Parameter;
 
+import java.io.Serializable;
+
 /**
  * Configs needed to sync data into the Hive Metastore.
  */
 public class HiveSyncConfig extends HoodieSyncConfig {
 
-  @Parameter(names = {"--user"}, description = "Hive username")
-  public String hiveUser;
-
-  @Parameter(names = {"--pass"}, description = "Hive password")
-  public String hivePass;
-
-  @Parameter(names = {"--jdbc-url"}, description = "Hive jdbc connect url")
-  public String jdbcUrl;
-
-  @Parameter(names = {"--metastore-uris"}, description = "Hive metastore uris")
-  public String metastoreUris;
-
-  @Parameter(names = {"--use-pre-apache-input-format"},
-      description = "Use InputFormat under com.uber.hoodie package "
-          + "instead of org.apache.hudi package. Use this when you are in the process of migrating from "
-          + "com.uber.hoodie to org.apache.hudi. Stop using this after you migrated the table definition to "
-          + "org.apache.hudi input format.")
-  public Boolean usePreApacheInputFormat;
-
-  @Parameter(names = {"--bucket-spec"}, description = "bucket spec stored in metastore", required = false)
-  public String bucketSpec;
-
-  @Deprecated
-  @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
-  public Boolean useJdbc;
-
-  @Parameter(names = {"--sync-mode"}, description = "Mode to choose for Hive ops. Valid values are hms,glue,jdbc and hiveql")
-  public String syncMode;
-
-  @Parameter(names = {"--auto-create-database"}, description = "Auto create hive database")
-  public Boolean autoCreateDatabase;
-
-  @Parameter(names = {"--ignore-exceptions"}, description = "Ignore hive exceptions")
-  public Boolean ignoreExceptions;
-
-  @Parameter(names = {"--skip-ro-suffix"}, description = "Skip the `_ro` suffix for Read optimized table, when registering")
-  public Boolean skipROSuffix;
-
-  @Parameter(names = {"--table-properties"}, description = "Table properties to hive table")
-  public String tableProperties;
-
-  @Parameter(names = {"--serde-properties"}, description = "Serde properties to hive table")
-  public String serdeProperties;
-
-  @Parameter(names = {"--help", "-h"}, help = true)
-  public Boolean help = false;
-
-  @Parameter(names = {"--support-timestamp"}, description = "'INT64' with original type TIMESTAMP_MICROS is converted to hive 'timestamp' type."
-      + "Disabled by default for backward compatibility.")
-  public Boolean supportTimestamp;
-
-  @Parameter(names = {"--managed-table"}, description = "Create a managed table")
-  public Boolean createManagedTable;
-
-  @Parameter(names = {"--batch-sync-num"}, description = "The number of partitions one batch when synchronous partitions to hive")
-  public Integer batchSyncNum;
-
-  @Parameter(names = {"--spark-datasource"}, description = "Whether sync this table as spark data source table.")
-  public Boolean syncAsSparkDataSourceTable;
-
-  @Parameter(names = {"--spark-schema-length-threshold"}, description = "The maximum length allowed in a single cell when storing additional schema information in Hive's metastore.")
-  public int sparkSchemaLengthThreshold;
-
-  @Parameter(names = {"--with-operation-field"}, description = "Whether to include the '_hoodie_operation' field in the metadata fields")
-  public Boolean withOperationField = false;
-
-  @Parameter(names = {"--sync-comment"}, description = "synchronize table comments to hive")
-  public boolean syncComment = false;
+  public final HiveSyncConfigParams hiveSyncConfigParams = new HiveSyncConfigParams();
 

Review Comment:
   Can we avoid introducing nested POJOs ? The POJO brings in too many modifications but no benefit here ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #5754: [HUDI-3730] Improve meta sync class design and hierarchies

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #5754:
URL: https://github.com/apache/hudi/pull/5754#discussion_r895216045


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/operation/CatalogSync.java:
##########
@@ -0,0 +1,32 @@
+package org.apache.hudi.sync.common.operation;

Review Comment:
   @fengjian428 you'd need to configure IDE to auto add license to source files see https://hudi.apache.org/contribute/developer-setup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #5754: [HUDI-3730] Improve meta sync class design and hierarchies

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #5754:
URL: https://github.com/apache/hudi/pull/5754#discussion_r895303808


##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java:
##########
@@ -24,79 +24,14 @@
 
 import com.beust.jcommander.Parameter;
 
+import java.io.Serializable;
+
 /**
  * Configs needed to sync data into the Hive Metastore.
  */
 public class HiveSyncConfig extends HoodieSyncConfig {
 
-  @Parameter(names = {"--user"}, description = "Hive username")
-  public String hiveUser;
-
-  @Parameter(names = {"--pass"}, description = "Hive password")
-  public String hivePass;
-
-  @Parameter(names = {"--jdbc-url"}, description = "Hive jdbc connect url")
-  public String jdbcUrl;
-
-  @Parameter(names = {"--metastore-uris"}, description = "Hive metastore uris")
-  public String metastoreUris;
-
-  @Parameter(names = {"--use-pre-apache-input-format"},
-      description = "Use InputFormat under com.uber.hoodie package "
-          + "instead of org.apache.hudi package. Use this when you are in the process of migrating from "
-          + "com.uber.hoodie to org.apache.hudi. Stop using this after you migrated the table definition to "
-          + "org.apache.hudi input format.")
-  public Boolean usePreApacheInputFormat;
-
-  @Parameter(names = {"--bucket-spec"}, description = "bucket spec stored in metastore", required = false)
-  public String bucketSpec;
-
-  @Deprecated
-  @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
-  public Boolean useJdbc;
-
-  @Parameter(names = {"--sync-mode"}, description = "Mode to choose for Hive ops. Valid values are hms,glue,jdbc and hiveql")
-  public String syncMode;
-
-  @Parameter(names = {"--auto-create-database"}, description = "Auto create hive database")
-  public Boolean autoCreateDatabase;
-
-  @Parameter(names = {"--ignore-exceptions"}, description = "Ignore hive exceptions")
-  public Boolean ignoreExceptions;
-
-  @Parameter(names = {"--skip-ro-suffix"}, description = "Skip the `_ro` suffix for Read optimized table, when registering")
-  public Boolean skipROSuffix;
-
-  @Parameter(names = {"--table-properties"}, description = "Table properties to hive table")
-  public String tableProperties;
-
-  @Parameter(names = {"--serde-properties"}, description = "Serde properties to hive table")
-  public String serdeProperties;
-
-  @Parameter(names = {"--help", "-h"}, help = true)
-  public Boolean help = false;
-
-  @Parameter(names = {"--support-timestamp"}, description = "'INT64' with original type TIMESTAMP_MICROS is converted to hive 'timestamp' type."
-      + "Disabled by default for backward compatibility.")
-  public Boolean supportTimestamp;
-
-  @Parameter(names = {"--managed-table"}, description = "Create a managed table")
-  public Boolean createManagedTable;
-
-  @Parameter(names = {"--batch-sync-num"}, description = "The number of partitions one batch when synchronous partitions to hive")
-  public Integer batchSyncNum;
-
-  @Parameter(names = {"--spark-datasource"}, description = "Whether sync this table as spark data source table.")
-  public Boolean syncAsSparkDataSourceTable;
-
-  @Parameter(names = {"--spark-schema-length-threshold"}, description = "The maximum length allowed in a single cell when storing additional schema information in Hive's metastore.")
-  public int sparkSchemaLengthThreshold;
-
-  @Parameter(names = {"--with-operation-field"}, description = "Whether to include the '_hoodie_operation' field in the metadata fields")
-  public Boolean withOperationField = false;
-
-  @Parameter(names = {"--sync-comment"}, description = "synchronize table comments to hive")
-  public boolean syncComment = false;
+  public final HiveSyncConfigParams hiveSyncConfigParams = new HiveSyncConfigParams();
 

Review Comment:
   @danny0405 @fengjian428 working on a follow-up PR to align with the RFC



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org