You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/20 02:55:36 UTC

[GitHub] [incubator-hudi] bvaradar opened a new pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

bvaradar opened a new pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255
 
 
   
   ## What is the purpose of the pull request
   
   Make sure by default table layout version honors the configuration in hoodie.properties
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   ## Committer checklist
   
    - [X] Has a corresponding JIRA in PR title & commit
    
    - [X] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368363395
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##########
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-    return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+    return props.containsKey(TIMELINE_LAYOUT_VERSION)
+        ? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   Good point.. In an ideal world, `hoodie.properties` is the source of truth and once you set this version there, both timeline writers and readers respect that.. but the issue here seems to be that we want to change this to say VERSION_1 even for older tables and rely on the fact that null version and version_1 both can be read by older readers.. 
   
   > If we keep version_0 as default, it would override the version even for new tables which has Version_1 in hoodie.properties
   
   I'd imagine this will happen only if at some point, the user set the value to `version_1` and then switched back to default? This can always happen right, like the user going back to the previous release.. ? 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368363395
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##########
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-    return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+    return props.containsKey(TIMELINE_LAYOUT_VERSION)
+        ? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   Good point.. In an ideal world, `hoodie.properties` is the source of truth and once you set this version there, both timeline writers and readers respect that.. but the issue here seems to be that we want to change this to say VERSION_1 even for older tables and rely on the fact that null version and version_1 both can be read by older readers.. 
   
   > If we keep version_0 as default, it would override the version even for new tables which has Version_1 in hoodie.properties
   
   I'd imagine this will happen only if at some point, the user set the value to `version_1` and then switched back to default? This can always happen right, like the user going back to the previous release.. ? 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368386980
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/client/utils/ClientUtils.java
 ##########
 @@ -37,6 +37,8 @@
   public static HoodieTableMetaClient createMetaClient(JavaSparkContext jsc, HoodieWriteConfig config,
       boolean loadActiveTimelineOnLoad) {
     return new HoodieTableMetaClient(jsc.hadoopConfiguration(), config.getBasePath(), loadActiveTimelineOnLoad,
-        config.getConsistencyGuardConfig(), Option.of(new TimelineLayoutVersion(config.getTimelineLayoutVersion())));
+        config.getConsistencyGuardConfig(),
+        Option.ofNullable((config.getTimelineLayoutVersion() != null)
+            ? new TimelineLayoutVersion(config.getTimelineLayoutVersion()) : null));
 
 Review comment:
   can the default in HoodieWriteConfig be `Option.empty` instead of null? Then, we can simply make `config.getTimelineLayoutVersion()` return a Option directly

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368398474
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 ##########
 @@ -117,6 +117,13 @@ public HoodieTableMetaClient(Configuration conf, String basePath, boolean loadAc
     TableNotFoundException.checkTableValidity(fs, basePathDir, metaPathDir);
     this.tableConfig = new HoodieTableConfig(fs, metaPath, payloadClassName);
     this.tableType = tableConfig.getTableType();
+    if (layoutVersion.isPresent()) {
+      // Ensure layout version passed in config is not lower than the one seen in hoodie.properties
+      TimelineLayoutVersion tableConfigVersion = tableConfig.getTimelineLayoutVersion();
 
 Review comment:
   You are correct, Table Config Version can be null. I have made tableConfig.getTimelineLayoutVersion() to return an option instead

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar merged pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
bvaradar merged pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368356422
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##########
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-    return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+    return props.containsKey(TIMELINE_LAYOUT_VERSION)
+        ? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   and can't we do this using the regular defaults way? why the special handling for containsKey etc? `props.getProperty(k, default)`? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368386515
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 ##########
 @@ -117,6 +117,13 @@ public HoodieTableMetaClient(Configuration conf, String basePath, boolean loadAc
     TableNotFoundException.checkTableValidity(fs, basePathDir, metaPathDir);
     this.tableConfig = new HoodieTableConfig(fs, metaPath, payloadClassName);
     this.tableType = tableConfig.getTableType();
+    if (layoutVersion.isPresent()) {
+      // Ensure layout version passed in config is not lower than the one seen in hoodie.properties
+      TimelineLayoutVersion tableConfigVersion = tableConfig.getTimelineLayoutVersion();
 
 Review comment:
   can't this be `null`? if so, would `compareTo` below be still happy?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368362263
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##########
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-    return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+    return props.containsKey(TIMELINE_LAYOUT_VERSION)
+        ? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   @vinothchandar : Unlike other configs, this is a config defined at 2 levels - hoodie.properties and in HoodieWriteConfig. If we actually keep a non-null value in config, the semantics is that it would override the one in hoodie.properties. If we keep version_0 as default, it would override the version even for new tables which has Version_1 in hoodie.properties ?   Let me know if I am missing something. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368363395
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##########
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-    return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+    return props.containsKey(TIMELINE_LAYOUT_VERSION)
+        ? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   Good point.. In an ideal world, `hoodie.properties` is the source of truth and once you set this version there, both timeline writers and readers respect that.. but the issue here seems to be that we want to change this to say VERSION_1 even for older tables and rely on the fact that null version and version_1 both can be read by older readers.. 
   
   > If we keep version_0 as default, it would override the version even for new tables which has Version_1 in hoodie.properties
   I'd imagine this will happen only if at some point, the user set the value to `version_1` and then switched back to default? This can always happen right, like the user going back to the previous release.. ? 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368398278
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/client/utils/ClientUtils.java
 ##########
 @@ -37,6 +37,8 @@
   public static HoodieTableMetaClient createMetaClient(JavaSparkContext jsc, HoodieWriteConfig config,
       boolean loadActiveTimelineOnLoad) {
     return new HoodieTableMetaClient(jsc.hadoopConfiguration(), config.getBasePath(), loadActiveTimelineOnLoad,
-        config.getConsistencyGuardConfig(), Option.of(new TimelineLayoutVersion(config.getTimelineLayoutVersion())));
+        config.getConsistencyGuardConfig(),
+        Option.ofNullable((config.getTimelineLayoutVersion() != null)
+            ? new TimelineLayoutVersion(config.getTimelineLayoutVersion()) : null));
 
 Review comment:
   Yeah, this can be simplified. HoodieWriteConfig's TimelineLayoutVersion will not be null as it has default. Will change based on that.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368356051
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##########
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-    return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+    return props.containsKey(TIMELINE_LAYOUT_VERSION)
+        ? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   should we just use a VERSION_0 instead of `null`? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368363395
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##########
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-    return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+    return props.containsKey(TIMELINE_LAYOUT_VERSION)
+        ? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   Main issue here seems to be that, we want this config to be respected at a write level, even when the value is not in `hoodie.properties`, for e.g support timeline version =1 even for older tables. `hoodie.properties` is involved here, purely because the query (any timeline reader) needs to know how to read the timeline correct? 
   
   code style aside, I feel this is still not very clear to me.. :( 
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services