You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2021/01/05 20:11:32 UTC

[GitHub] [incubator-gobblin] ZihanLi58 opened a new pull request #3181: [GOBBLIN-1344] Enable configurable schema source db and table when registering hive schema

ZihanLi58 opened a new pull request #3181:
URL: https://github.com/apache/incubator-gobblin/pull/3181


   …gistering hive schema
   
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-1344
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if applicable):
   We need to set schema.literal for hive table, but in ORC compaction pipeline, there is no context for avro schema. So for de-duped DB which is registered by compaction pipeline, we need to be able to configure the schema source db/table to get the avro schema registered by data ingestion pipeline
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #3181: [GOBBLIN-1344] Enable configurable schema source db and table when registering hive schema

Posted by GitBox <gi...@apache.org>.
ZihanLi58 commented on a change in pull request #3181:
URL: https://github.com/apache/incubator-gobblin/pull/3181#discussion_r547524434



##########
File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java
##########
@@ -280,8 +291,15 @@ private boolean ensureHiveTableExistenceBeforeAlternation(String tableName, Stri
         try (Timer.Context context = this.metricContext.timer(GET_HIVE_TABLE).time()) {
           existingTable = HiveMetaStoreUtils.getHiveTable(client.getTable(dbName, tableName));
         }
+        HiveTable schemaSourceTable = existingTable;
+        if (state.contains(SCHEMA_SOURCE_DB) || state.contains(SCHEMA_SOURCE_TABLE)) {

Review comment:
       In both compaction pipeline and ingestion pipeline, this state is topic specific state. And for this case that db is registered as an additional db, the db name is specified by a job level config. So I think for all the tables in this db, it should has the same source db. That's why I do want to make this config a job level config also. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] ZihanLi58 commented on a change in pull request #3181: [GOBBLIN-1344] Enable configurable schema source db and table when registering hive schema

Posted by GitBox <gi...@apache.org>.
ZihanLi58 commented on a change in pull request #3181:
URL: https://github.com/apache/incubator-gobblin/pull/3181#discussion_r547523532



##########
File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java
##########
@@ -93,6 +93,8 @@
 
   public static final String HIVE_REGISTER_METRICS_PREFIX = "hiveRegister.";
   public static final String ADD_PARTITION_TIMER = HIVE_REGISTER_METRICS_PREFIX + "addPartitionTimerTimer";
+  public static final String SCHEMA_SOURCE_DB = HIVE_REGISTER_METRICS_PREFIX + "schema.source.dbName";
+  public static final String SCHEMA_SOURCE_TABLE = HIVE_REGISTER_METRICS_PREFIX + "schema.source.tableName";

Review comment:
       Will address this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #3181: [GOBBLIN-1344] Enable configurable schema source db and table when registering hive schema

Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #3181:
URL: https://github.com/apache/incubator-gobblin/pull/3181#discussion_r547498669



##########
File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java
##########
@@ -280,8 +291,15 @@ private boolean ensureHiveTableExistenceBeforeAlternation(String tableName, Stri
         try (Timer.Context context = this.metricContext.timer(GET_HIVE_TABLE).time()) {
           existingTable = HiveMetaStoreUtils.getHiveTable(client.getTable(dbName, tableName));
         }
+        HiveTable schemaSourceTable = existingTable;
+        if (state.contains(SCHEMA_SOURCE_DB) || state.contains(SCHEMA_SOURCE_TABLE)) {
+          try (Timer.Context context = this.metricContext.timer(GET_HIVE_TABLE).time()) {

Review comment:
       timer shouldn't be named after `GET_HIVE_TABLE` here. 

##########
File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java
##########
@@ -93,6 +93,8 @@
 
   public static final String HIVE_REGISTER_METRICS_PREFIX = "hiveRegister.";
   public static final String ADD_PARTITION_TIMER = HIVE_REGISTER_METRICS_PREFIX + "addPartitionTimerTimer";
+  public static final String SCHEMA_SOURCE_DB = HIVE_REGISTER_METRICS_PREFIX + "schema.source.dbName";
+  public static final String SCHEMA_SOURCE_TABLE = HIVE_REGISTER_METRICS_PREFIX + "schema.source.tableName";

Review comment:
       I would rather we getting rid of source table here for simplicity (and add javadoc indicating our assumption that the actual source table will be named the same with only database name being different). wdyt ? 

##########
File path: gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreBasedRegister.java
##########
@@ -280,8 +291,15 @@ private boolean ensureHiveTableExistenceBeforeAlternation(String tableName, Stri
         try (Timer.Context context = this.metricContext.timer(GET_HIVE_TABLE).time()) {
           existingTable = HiveMetaStoreUtils.getHiveTable(client.getTable(dbName, tableName));
         }
+        HiveTable schemaSourceTable = existingTable;
+        if (state.contains(SCHEMA_SOURCE_DB) || state.contains(SCHEMA_SOURCE_TABLE)) {

Review comment:
       Should the information of source DB coming from HiveSpec instead of `state`?  IIRC `state` will be applied for all paths' registration here, it is more like a global state but not for a specific table/db




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] asfgit closed pull request #3181: [GOBBLIN-1344] Enable configurable schema source db and table when registering hive schema

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3181:
URL: https://github.com/apache/incubator-gobblin/pull/3181


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-gobblin] ZihanLi58 closed pull request #3181: [GOBBLIN-1344] Enable configurable schema source db and table when registering hive schema

Posted by GitBox <gi...@apache.org>.
ZihanLi58 closed pull request #3181:
URL: https://github.com/apache/incubator-gobblin/pull/3181


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org