You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/24 07:50:50 UTC

[GitHub] [incubator-hudi] n3nash opened a new pull request #1278: Refactoring getter to avoid double extrametadata in json representation

n3nash opened a new pull request #1278: Refactoring getter to avoid double extrametadata in json representation
URL: https://github.com/apache/incubator-hudi/pull/1278
 
 
   ## What is the purpose of the pull request
   
   This PR fixes the double representation of the extra metadata map in HoodieCommitMetadata
   
   ## Brief change log
   
   Refactored getter in HoodieCommitMetadata to avoid double string representation when object is converted to json string
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#discussion_r382159855
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##########
 @@ -47,15 +47,15 @@
   protected Map<String, List<HoodieWriteStat>> partitionToWriteStats;
   protected Boolean compacted;
 
-  private Map<String, String> extraMetadataMap;
+  private Map<String, String> extraMetadata;
 
 Review comment:
   I was asking about any issues around storing extra-metadata in commit files in active timeline. For example, As we store them in json format in active timeline, wanted to check if DeltaStreamer can read checkpoints without any problem from the json file. But,,having thought about it, this should be fine. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar merged pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata

Posted by GitBox <gi...@apache.org>.
bvaradar merged pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#discussion_r371418845
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##########
 @@ -47,15 +47,15 @@
   protected Map<String, List<HoodieWriteStat>> partitionToWriteStats;
   protected Boolean compacted;
 
-  private Map<String, String> extraMetadataMap;
+  private Map<String, String> extraMetadata;
 
 Review comment:
   So the change I have done doesn't change AVRO generated pojo (which is what is really used to serialize to disk), so that should work just fine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#discussion_r370946850
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##########
 @@ -47,15 +47,15 @@
   protected Map<String, List<HoodieWriteStat>> partitionToWriteStats;
   protected Boolean compacted;
 
-  private Map<String, String> extraMetadataMap;
+  private Map<String, String> extraMetadata;
 
 Review comment:
   @n3nash : Is there anything we need to do to handle backwards compatibility with commit metadata written in older format. Would this change automatically handle that ? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#issuecomment-589546842
 
 
   @bvaradar please check prior to merging if the commit message will retain the JIRA number :) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash edited a comment on issue #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata

Posted by GitBox <gi...@apache.org>.
n3nash edited a comment on issue #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#issuecomment-578251544
 
 
   When ObjectMapper converts a POJO to json string, it generates getters for the variables in the class as well as pulls the existing getters in the class after which it will stitch together the object to string. In HoodieCommitMetadata, there is a variable extraMetadataMap and also a getter method getExtraMetadata(). During serialization, there will be 2 object to string conversions 1) getExtraMetadata() 2) getExtraMetadataMap() (created by ObjectMapper) that results in 2 copies of the same data in JSON.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#discussion_r371418845
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##########
 @@ -47,15 +47,15 @@
   protected Map<String, List<HoodieWriteStat>> partitionToWriteStats;
   protected Boolean compacted;
 
-  private Map<String, String> extraMetadataMap;
+  private Map<String, String> extraMetadata;
 
 Review comment:
   So the change I have done doesn't change AVRO generated pojo, so that should work just fine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] n3nash commented on issue #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata

Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#issuecomment-578251544
 
 
   When ObjectMapper converts a POJO to json string, it generates getters for the variables in the class as well as pulls the existing getters in the class after which it will stitch together the object to string. In HoodieCommitMetadata, there is a variable extraMetadataMap and also a getting getExtraMetadata(). During serialization, there will be 2 object to string conversions 1) getExtraMetadata() 2) getExtraMetadataMap() (created by ObjectMapper) that results in 2 copies of the same data in JSON.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services