You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/27 22:59:53 UTC

[GitHub] [hudi] zhedoubushishi opened a new pull request, #5987: [HUDI-4331] Allow loading external config file from class loader

zhedoubushishi opened a new pull request, #5987:
URL: https://github.com/apache/hudi/pull/5987

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   This PR aims to let Hudi external config support Spark cluster mode. In Spark cluster mode, the Spark driver would run on a random node within the cluster. Hudi external config file only exists in the master node and this would cause a file not found issue when the Spark driver is not running on the master node.
   
   In this PR, when loading the external config file, it would first try to load the hudi-defaults.conf from the class loader. To add hudi-defaults.conf to the Spark class loader, similar to hive-site.xml, we can add hudi-defaults.conf through spark.yarn.dist.files option.
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [x] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] zhedoubushishi commented on a diff in pull request #5987: [HUDI-4331] Allow loading external config file from class loader

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on code in PR #5987:
URL: https://github.com/apache/hudi/pull/5987#discussion_r910376337


##########
hudi-common/src/main/java/org/apache/hudi/common/config/DFSPropertiesConfiguration.java:
##########
@@ -90,11 +91,24 @@ public DFSPropertiesConfiguration() {
   }
 
   /**
-   * Load global props from hudi-defaults.conf which is under CONF_FILE_DIR_ENV_NAME.
+   * Load global props from hudi-defaults.conf which is under class loader or CONF_FILE_DIR_ENV_NAME.
    * @return Typed Properties
    */
   public static TypedProperties loadGlobalProps() {
     DFSPropertiesConfiguration conf = new DFSPropertiesConfiguration();
+
+    // First try loading the external config file from class loader
+    URL configFile = Thread.currentThread().getContextClassLoader().getResource(DEFAULT_PROPERTIES_FILE);
+    if (configFile != null) {
+      try (BufferedReader br = new BufferedReader(new InputStreamReader(configFile.openStream()))) {
+        conf.addPropsFromStream(br);
+        return conf.getProps();
+      } catch (IOException ioe) {
+        throw new HoodieIOException(

Review Comment:
   Thanks for taking a review. I feel like if a customer sets an external hudi config file but somehow Hudi fails to read it, we should terminate the job rather than continuing running it. Because this would be a job with unexpected configs. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #5987: [HUDI-4331] Allow loading external config file from class loader

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #5987:
URL: https://github.com/apache/hudi/pull/5987#discussion_r910220315


##########
hudi-common/src/main/java/org/apache/hudi/common/config/DFSPropertiesConfiguration.java:
##########
@@ -90,11 +91,24 @@ public DFSPropertiesConfiguration() {
   }
 
   /**
-   * Load global props from hudi-defaults.conf which is under CONF_FILE_DIR_ENV_NAME.
+   * Load global props from hudi-defaults.conf which is under class loader or CONF_FILE_DIR_ENV_NAME.
    * @return Typed Properties
    */
   public static TypedProperties loadGlobalProps() {
     DFSPropertiesConfiguration conf = new DFSPropertiesConfiguration();
+
+    // First try loading the external config file from class loader
+    URL configFile = Thread.currentThread().getContextClassLoader().getResource(DEFAULT_PROPERTIES_FILE);
+    if (configFile != null) {
+      try (BufferedReader br = new BufferedReader(new InputStreamReader(configFile.openStream()))) {
+        conf.addPropsFromStream(br);
+        return conf.getProps();
+      } catch (IOException ioe) {
+        throw new HoodieIOException(

Review Comment:
   Should this emit a warning message instead of erroring out?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5987: [HUDI-4331] Allow loading external config file from class loader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5987:
URL: https://github.com/apache/hudi/pull/5987#issuecomment-1168033142

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7f26e9d2ab6d19cc842a9a69e486751c6b7400e6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7f26e9d2ab6d19cc842a9a69e486751c6b7400e6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7f26e9d2ab6d19cc842a9a69e486751c6b7400e6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #5987: [HUDI-4331] Allow loading external config file from class loader

Posted by GitBox <gi...@apache.org>.
yihua merged PR #5987:
URL: https://github.com/apache/hudi/pull/5987


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #5987: [HUDI-4331] Allow loading external config file from class loader

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #5987:
URL: https://github.com/apache/hudi/pull/5987#discussion_r910500003


##########
hudi-common/src/main/java/org/apache/hudi/common/config/DFSPropertiesConfiguration.java:
##########
@@ -90,11 +91,24 @@ public DFSPropertiesConfiguration() {
   }
 
   /**
-   * Load global props from hudi-defaults.conf which is under CONF_FILE_DIR_ENV_NAME.
+   * Load global props from hudi-defaults.conf which is under class loader or CONF_FILE_DIR_ENV_NAME.
    * @return Typed Properties
    */
   public static TypedProperties loadGlobalProps() {
     DFSPropertiesConfiguration conf = new DFSPropertiesConfiguration();
+
+    // First try loading the external config file from class loader
+    URL configFile = Thread.currentThread().getContextClassLoader().getResource(DEFAULT_PROPERTIES_FILE);
+    if (configFile != null) {
+      try (BufferedReader br = new BufferedReader(new InputStreamReader(configFile.openStream()))) {
+        conf.addPropsFromStream(br);
+        return conf.getProps();
+      } catch (IOException ioe) {
+        throw new HoodieIOException(

Review Comment:
   Got it, makes sense.  So the read of configs only happens when the external Hudi config file is present.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5987: [HUDI-4331] Allow loading external config file from class loader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5987:
URL: https://github.com/apache/hudi/pull/5987#issuecomment-1168035562

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7f26e9d2ab6d19cc842a9a69e486751c6b7400e6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9593",
       "triggerID" : "7f26e9d2ab6d19cc842a9a69e486751c6b7400e6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7f26e9d2ab6d19cc842a9a69e486751c6b7400e6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9593) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5987: [HUDI-4331] Allow loading external config file from class loader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5987:
URL: https://github.com/apache/hudi/pull/5987#issuecomment-1168108594

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7f26e9d2ab6d19cc842a9a69e486751c6b7400e6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9593",
       "triggerID" : "7f26e9d2ab6d19cc842a9a69e486751c6b7400e6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7f26e9d2ab6d19cc842a9a69e486751c6b7400e6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9593) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] TengHuo commented on a diff in pull request #5987: [HUDI-4331] Allow loading external config file from class loader

Posted by GitBox <gi...@apache.org>.
TengHuo commented on code in PR #5987:
URL: https://github.com/apache/hudi/pull/5987#discussion_r982231380


##########
hudi-common/src/main/java/org/apache/hudi/common/config/DFSPropertiesConfiguration.java:
##########
@@ -90,11 +91,24 @@ public DFSPropertiesConfiguration() {
   }
 
   /**
-   * Load global props from hudi-defaults.conf which is under CONF_FILE_DIR_ENV_NAME.
+   * Load global props from hudi-defaults.conf which is under class loader or CONF_FILE_DIR_ENV_NAME.
    * @return Typed Properties
    */
   public static TypedProperties loadGlobalProps() {
     DFSPropertiesConfiguration conf = new DFSPropertiesConfiguration();
+
+    // First try loading the external config file from class loader
+    URL configFile = Thread.currentThread().getContextClassLoader().getResource(DEFAULT_PROPERTIES_FILE);
+    if (configFile != null) {
+      try (BufferedReader br = new BufferedReader(new InputStreamReader(configFile.openStream()))) {
+        conf.addPropsFromStream(br);
+        return conf.getProps();

Review Comment:
   It's a good feature that we can setup a config file from classpath. 
   
   But I think this `return` statement here makes it can't work with `getConfPathFromEnv()` together. Should it be better if we just `conf.addPropsFromStream(br)` here?
   
   Then, if user setup `HUDI_CONF_DIR` in system env, the config file can be loaded as the old behaviour.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org