You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/14 10:48:20 UTC

[GitHub] [hudi] UZi5136225 opened a new pull request #1968: [HUDI-1192]

UZi5136225 opened a new pull request #1968:
URL: https://github.com/apache/hudi/pull/1968


   Fix the failure of creating hive database
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Optimization the failure of creating hive database
   And cause repeated printing of warn logs
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

yanghua commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r491761859



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -290,6 +290,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "hoodie.datasource.hive_sync.auto.create.database"

Review comment:
       Following the existing style, it would be better `hoodie.datasource.hive_sync.auto_create_database`. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #1968: [HUDI-1192] Optimization the failure of creating hive database

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r472914790



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -282,6 +282,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_CREATE_DATABASE_OPT_KEY = "hoodie.datasource.hive_sync.create.database"

Review comment:
       Okay, I am finished, please review
   @yanghua 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r495515624



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##########
@@ -336,6 +337,22 @@ public boolean doesTableExist(String tableName) {
     }
   }
 
+  /**
+   * @param databaseName
+   * @return  true if the configured database exists
+   */
+  public boolean doesDataBaseExist(String databaseName) {

Review comment:
       What do you think？ if checkDatabaseExist is better, I am very willing to modify
   @yanghua 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r492448849



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -290,6 +290,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "hoodie.datasource.hive_sync.auto.create.database"

Review comment:
       ok,thanks for the suggestion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

yanghua commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r480492617



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -117,11 +117,13 @@ private void syncHoodieTable(String tableName, boolean useRealtimeInputFormat) {
     boolean tableExists = hoodieHiveClient.doesTableExist(tableName);
 
     // check if the database exists else create it
-    try {
-      hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);
-    } catch (Exception e) {
-      // this is harmless since table creation will fail anyways, creation of DB is needed for in-memory testing
-      LOG.warn("Unable to create database", e);
+    if (cfg.enableCreateDatabase) {
+      try {
+        hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);

Review comment:
       Yes, I have discussed with @liujinhui1994 off-line about this issue and asked him to find a way to check whether the database exists or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

yanghua commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r474348589



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##########
@@ -71,6 +71,9 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = {"--enable-create-database"}, description = "Create hive database")

Review comment:
       "Create hive database" -> "Enable create hive database"

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -117,11 +117,13 @@ private void syncHoodieTable(String tableName, boolean useRealtimeInputFormat) {
     boolean tableExists = hoodieHiveClient.doesTableExist(tableName);
 
     // check if the database exists else create it
-    try {
-      hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);
-    } catch (Exception e) {
-      // this is harmless since table creation will fail anyways, creation of DB is needed for in-memory testing
-      LOG.warn("Unable to create database", e);
+    if (cfg.enableCreateDatabase) {
+      try {
+        hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);

Review comment:
       I still have a question here. If we do not allow initializing hive database. Or if the database does not exist. Would affect the subsequent business logic?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r474436197



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -117,11 +117,13 @@ private void syncHoodieTable(String tableName, boolean useRealtimeInputFormat) {
     boolean tableExists = hoodieHiveClient.doesTableExist(tableName);
 
     // check if the database exists else create it
-    try {
-      hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);
-    } catch (Exception e) {
-      // this is harmless since table creation will fail anyways, creation of DB is needed for in-memory testing
-      LOG.warn("Unable to create database", e);
+    if (cfg.enableCreateDatabase) {
+      try {
+        hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);

Review comment:
       The choice to create a database here is because hive database is usually only created by hive users, and other users have no permission to create hive tables.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on a change in pull request #1968: [HUDI-1192] Optimization the failure of creating hive database

Posted by GitBox <gi...@apache.org>.

yanghua commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r471273476



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -282,6 +282,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_CREATE_DATABASE_OPT_KEY = "hoodie.datasource.hive_sync.create.database"

Review comment:
       Can you update the key so that it could describe the thing: initialize the database automatically?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] wangxianghu commented on pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

wangxianghu commented on pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#issuecomment-687061567


   > > Currently, In an ETL pipeline(eg, kafka -> hudi -> hive), If the process of hudi to hive failed, the job is still running.
   > > I think we can add a switch to control the job behavior(fail or keep running) when kafka to hudi is ok, while hudi to hive failed, leave the choice to user. since ingesting data to hudi and sync to hive is a complete task in some scenes.
   > > WDYT? @vinothchandar @yanghua @liujinhui1994
   > 
   > sounds reasonable.
   
   Filed a ticket here:https://issues.apache.org/jira/projects/HUDI/issues/HUDI-1269


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] UZi5136225 commented on pull request #1968: [HUDI-1192] Optimization the failure of creating hive database

Posted by GitBox <gi...@apache.org>.

UZi5136225 commented on pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#issuecomment-674017690


   @yanghua 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r474437973



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##########
@@ -71,6 +71,9 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = {"--enable-create-database"}, description = "Create hive database")

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r484615721



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -282,6 +282,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_CREATE_DATABASE_ENABLED_OPT_KEY = "hoodie.datasource.hive_sync.create.database.enable"

Review comment:
       just bumping this. I think this naming is not more self-explanatory @liujinhui1994  wdyt? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

yanghua commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r491761859



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -290,6 +290,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "hoodie.datasource.hive_sync.auto.create.database"

Review comment:
       Following the existing style, it would be better `hoodie.datasource.hive_sync.auto_create_database`. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

yanghua commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r495515127



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##########
@@ -336,6 +337,22 @@ public boolean doesTableExist(String tableName) {
     }
   }
 
+  /**
+   * @param databaseName
+   * @return  true if the configured database exists
+   */
+  public boolean doesDataBaseExist(String databaseName) {

Review comment:
       It's an open topic. IMHO, `checkDatabaseExist` is a better method name. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r480260405



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -282,6 +282,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_CREATE_DATABASE_ENABLED_OPT_KEY = "hoodie.datasource.hive_sync.create.database.enable"

Review comment:
       rename this well to `.auto.create.database` ? 

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -117,11 +117,13 @@ private void syncHoodieTable(String tableName, boolean useRealtimeInputFormat) {
     boolean tableExists = hoodieHiveClient.doesTableExist(tableName);
 
     // check if the database exists else create it
-    try {
-      hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);
-    } catch (Exception e) {
-      // this is harmless since table creation will fail anyways, creation of DB is needed for in-memory testing
-      LOG.warn("Unable to create database", e);
+    if (cfg.enableCreateDatabase) {
+      try {
+        hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);

Review comment:
       Should we just throw an error and exit if `enableCreateDatabase=false` and the database does not exist? 

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##########
@@ -71,6 +71,9 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = {"--enable-create-database"}, description = "Enable create hive database")
+  public Boolean enableCreateDatabase = false;

Review comment:
       rename: autoCreateDatabase ? 

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##########
@@ -71,6 +71,9 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = {"--enable-create-database"}, description = "Enable create hive database")
+  public Boolean enableCreateDatabase = false;

Review comment:
       Also should the value be `true` by default to preserve existing behavior




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r495515486



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##########
@@ -336,6 +337,22 @@ public boolean doesTableExist(String tableName) {
     }
   }
 
+  /**
+   * @param databaseName
+   * @return  true if the configured database exists
+   */
+  public boolean doesDataBaseExist(String databaseName) {

Review comment:
       I refer to other names in the source code that also have similar functions
   ![749cea6a2c22aeb14200eb89bfd4bde](https://user-images.githubusercontent.com/25769285/94353758-73f66c80-00a7-11eb-96a6-1c2d7cbf8388.png)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#issuecomment-688580026


   >If the process of hudi to hive failed, the job is still running.
   I think we can add a switch to control the job behavior(fail or keep running) 
   
   +1 to have this configurable 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r485910367



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -117,11 +117,13 @@ private void syncHoodieTable(String tableName, boolean useRealtimeInputFormat) {
     boolean tableExists = hoodieHiveClient.doesTableExist(tableName);
 
     // check if the database exists else create it
-    try {
-      hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);
-    } catch (Exception e) {
-      // this is harmless since table creation will fail anyways, creation of DB is needed for in-memory testing
-      LOG.warn("Unable to create database", e);
+    if (cfg.enableCreateDatabase) {
+      try {
+        hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName);

Review comment:
       +1 on throwing the error. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r485909624



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##########
@@ -71,6 +71,9 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = {"--enable-create-database"}, description = "Enable create hive database")
+  public Boolean enableCreateDatabase = false;

Review comment:
       @vinothchandar We do not let hudi create databases by default. So false seems to be ok :) @bvaradar to chime in here. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r485921982



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##########
@@ -71,6 +71,9 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = {"--enable-create-database"}, description = "Enable create hive database")
+  public Boolean enableCreateDatabase = false;

Review comment:
       https://lists.apache.org/thread.html/e1b7f97c774e1d7d7fc54fbb46db49aaf2e217303a50d9885150242d%40%3Cdev.hudi.apache.org%3E - this is what I am referring to. :) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#issuecomment-692035393


   Please help review again,thanks 
   @yanghua @vinothchandar 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] wangxianghu commented on pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

wangxianghu commented on pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#issuecomment-686191782


   Currently, In an ETL pipeline(eg, kafka -> hudi -> hive), If the process of hudi to hive failed,  the job is still running.
   I think we can add a switch to control the job behavior(fail or keep running) when kafka to hudi is ok, while hudi to hive failed, leave the choice to user. since ingesting data to hudi and sync to hive is a complete task in some scenes.
   WDYT? @vinothchandar @yanghua @liujinhui1994 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

yanghua commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r495533937



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##########
@@ -336,6 +337,22 @@ public boolean doesTableExist(String tableName) {
     }
   }
 
+  /**
+   * @param databaseName
+   * @return  true if the configured database exists
+   */
+  public boolean doesDataBaseExist(String databaseName) {

Review comment:
       > What do you think？ if checkDatabaseExist is better, I am very willing to modify
   > @yanghua
   
   OK, keep the current name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r492448849



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -290,6 +290,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "hoodie.datasource.hive_sync.auto.create.database"

Review comment:
       ok,thanks for the suggestion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r487915975



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##########
@@ -71,6 +71,9 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = {"--enable-create-database"}, description = "Enable create hive database")
+  public Boolean enableCreateDatabase = false;

Review comment:
       Ok, my bad. Missed changes for creation of hive database. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#issuecomment-688749502


   Ok i will deal with this soon


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

yanghua commented on pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#issuecomment-686426501


   > Currently, In an ETL pipeline(eg, kafka -> hudi -> hive), If the process of hudi to hive failed, the job is still running.
   > I think we can add a switch to control the job behavior(fail or keep running) when kafka to hudi is ok, while hudi to hive failed, leave the choice to user. since ingesting data to hudi and sync to hive is a complete task in some scenes.
   > WDYT? @vinothchandar @yanghua @liujinhui1994
   
   sounds reasonable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua merged pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

yanghua merged pull request #1968:
URL: https://github.com/apache/hudi/pull/1968


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

liujinhui1994 commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r492448849



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -290,6 +290,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "hoodie.datasource.hive_sync.auto.create.database"

Review comment:
       ok,thanks for the suggestion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

Posted by GitBox <gi...@apache.org>.

yanghua commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r495422071



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -290,6 +290,7 @@ object DataSourceWriteOptions {
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = "hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = "hoodie.datasource.hive_sync.use_pre_apache_input_format"
   val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
+  val HIVE_AUTO_CREATE_DATABASE_OPT_KEY = "hoodie.datasource.hive_sync_auto_create_database"

Review comment:
       Can you follow the other pattern??? `hoodie.datasource.hive_sync.xxx` 
   Check it carefully, please.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org