You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/11 07:53:09 UTC

[GitHub] [hudi] boneanxs opened a new pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

boneanxs opened a new pull request #4791:
URL: https://github.com/apache/hudi/pull/4791


   …exist or not
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   TypedProperties.keyExist has a very bad performance because it copy all properties during check, actually this is unnecessary after this pr: [HUDI-1943](https://issues.apache.org/jira/browse/HUDI-1943)
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] boneanxs commented on pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
boneanxs commented on pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#issuecomment-1036950255


   @leesf @danny0405 Could you pls review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] boneanxs commented on a change in pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
boneanxs commented on a change in pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#discussion_r804452202



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java
##########
@@ -44,27 +43,22 @@ public TypedProperties(Properties defaults) {
   }
 
   private void checkKey(String property) {
-    if (!keyExists(property)) {
+    if (!containsKey(property)) {
       throw new IllegalArgumentException("Property " + property + " not found");

Review comment:
       You mean we can support key with different type? After this, TypedProperties also can only accept keys which is strings, because all public methods can only accept key as String




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#issuecomment-1035957077


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3c061f10be730b7edb473cdb90be1aa5236a28a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#discussion_r805108288



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java
##########
@@ -44,27 +43,22 @@ public TypedProperties(Properties defaults) {
   }
 
   private void checkKey(String property) {
-    if (!keyExists(property)) {
+    if (!containsKey(property)) {
       throw new IllegalArgumentException("Property " + property + " not found");

Review comment:
       Can we explain why we get performance promotion here ? The `TypedProperties` should not be a hotspot method right ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] boneanxs commented on a change in pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
boneanxs commented on a change in pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#discussion_r805292689



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java
##########
@@ -44,27 +43,22 @@ public TypedProperties(Properties defaults) {
   }
 
   private void checkKey(String property) {
-    if (!keyExists(property)) {
+    if (!containsKey(property)) {
       throw new IllegalArgumentException("Property " + property + " not found");

Review comment:
       The `TypedProperties` could be a hotspot method if we use `CustomKeyGenerator` to do the bulk insert, because it will create `SimpleKeyGenerator` and `ComplexKeyGenerator` when `getPartitionPath` and `getRecordKey`(these two methods will called for every rows in the dataset), creating these generators will call its' parent constructor method,
   
   ```java
   protected BaseKeyGenerator(TypedProperties config) {
       super(config);
       this.encodePartitionPath = config.getBoolean(KeyGeneratorOptions.URL_ENCODE_PARTITIONING.key(),
           Boolean.parseBoolean(KeyGeneratorOptions.URL_ENCODE_PARTITIONING.defaultValue()));
       this.hiveStylePartitioning = config.getBoolean(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE.key(),
           Boolean.parseBoolean(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE.defaultValue()));
       this.consistentLogicalTimestampEnabled = config.getBoolean(KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED.key(),
           Boolean.parseBoolean(KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED.defaultValue()));
     }
   ```
   As `TypedProperties.getBolean` will call `keyExists`, which copy all items to do the check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#issuecomment-1035957077


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3c061f10be730b7edb473cdb90be1aa5236a28a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#issuecomment-1035990377


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5905",
       "triggerID" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3c061f10be730b7edb473cdb90be1aa5236a28a7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5905) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] boneanxs commented on a change in pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
boneanxs commented on a change in pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#discussion_r805292689



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java
##########
@@ -44,27 +43,22 @@ public TypedProperties(Properties defaults) {
   }
 
   private void checkKey(String property) {
-    if (!keyExists(property)) {
+    if (!containsKey(property)) {
       throw new IllegalArgumentException("Property " + property + " not found");

Review comment:
       The `TypedProperties` could be a hotspot method if we use `CustomKeyGenerator` to do the bulk insert, because it will create `SimpleKeyGenerator` and `ComplexKeyGenerator` when `getPartitionPath` and `getRecordKey`(these two methods will called for every rows in the dataset), creating these generators will call its' parent constructor method,
   
   ```java
   protected BaseKeyGenerator(TypedProperties config) {
       super(config);
       this.encodePartitionPath = config.getBoolean(KeyGeneratorOptions.URL_ENCODE_PARTITIONING.key(),
           Boolean.parseBoolean(KeyGeneratorOptions.URL_ENCODE_PARTITIONING.defaultValue()));
       this.hiveStylePartitioning = config.getBoolean(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE.key(),
           Boolean.parseBoolean(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE.defaultValue()));
       this.consistentLogicalTimestampEnabled = config.getBoolean(KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED.key(),
           Boolean.parseBoolean(KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED.defaultValue()));
     }
   ```
   As `TypedProperties.getBolean` will call `keyExists`, which copy all items to do the check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#issuecomment-1035958612


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5905",
       "triggerID" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3c061f10be730b7edb473cdb90be1aa5236a28a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5905) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#discussion_r805482884



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java
##########
@@ -44,27 +43,22 @@ public TypedProperties(Properties defaults) {
   }
 
   private void checkKey(String property) {
-    if (!keyExists(property)) {
+    if (!containsKey(property)) {
       throw new IllegalArgumentException("Property " + property + " not found");

Review comment:
       Thanks, make sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 merged pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
danny0405 merged pull request #4791:
URL: https://github.com/apache/hudi/pull/4791


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#discussion_r804433456



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java
##########
@@ -44,27 +43,22 @@ public TypedProperties(Properties defaults) {
   }
 
   private void checkKey(String property) {
-    if (!keyExists(property)) {
+    if (!containsKey(property)) {
       throw new IllegalArgumentException("Property " + property + " not found");

Review comment:
       The old code requires that the key should be a string, what about the new code then ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4791: [HUDI-3412] TypedProperties no need to create new set when check key …

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4791:
URL: https://github.com/apache/hudi/pull/4791#issuecomment-1035958612


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5905",
       "triggerID" : "3c061f10be730b7edb473cdb90be1aa5236a28a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3c061f10be730b7edb473cdb90be1aa5236a28a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5905) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org