You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/24 07:50:50 UTC
[GitHub] [incubator-hudi] n3nash opened a new pull request #1278:
Refactoring getter to avoid double extrametadata in json representation
n3nash opened a new pull request #1278: Refactoring getter to avoid double extrametadata in json representation
URL: https://github.com/apache/incubator-hudi/pull/1278
## What is the purpose of the pull request
This PR fixes the double representation of the extra metadata map in HoodieCommitMetadata
## Brief change log
Refactored getter in HoodieCommitMetadata to avoid double string representation when object is converted to json string
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request
#1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json
representation of HoodieCommitMetadata
Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#discussion_r382159855
##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##########
@@ -47,15 +47,15 @@
protected Map<String, List<HoodieWriteStat>> partitionToWriteStats;
protected Boolean compacted;
- private Map<String, String> extraMetadataMap;
+ private Map<String, String> extraMetadata;
Review comment:
I was asking about any issues around storing extra-metadata in commit files in active timeline. For example, As we store them in json format in active timeline, wanted to check if DeltaStreamer can read checkpoints without any problem from the json file. But,,having thought about it, this should be fine.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] bvaradar merged pull request #1278: [HUDI-573]
Refactoring getter to avoid double extrametadata in json representation of
HoodieCommitMetadata
Posted by GitBox <gi...@apache.org>.
bvaradar merged pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] n3nash commented on a change in pull request
#1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json
representation of HoodieCommitMetadata
Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#discussion_r371418845
##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##########
@@ -47,15 +47,15 @@
protected Map<String, List<HoodieWriteStat>> partitionToWriteStats;
protected Boolean compacted;
- private Map<String, String> extraMetadataMap;
+ private Map<String, String> extraMetadata;
Review comment:
So the change I have done doesn't change AVRO generated pojo (which is what is really used to serialize to disk), so that should work just fine.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request
#1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json
representation of HoodieCommitMetadata
Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#discussion_r370946850
##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##########
@@ -47,15 +47,15 @@
protected Map<String, List<HoodieWriteStat>> partitionToWriteStats;
protected Boolean compacted;
- private Map<String, String> extraMetadataMap;
+ private Map<String, String> extraMetadata;
Review comment:
@n3nash : Is there anything we need to do to handle backwards compatibility with commit metadata written in older format. Would this change automatically handle that ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #1278: [HUDI-573]
Refactoring getter to avoid double extrametadata in json representation of
HoodieCommitMetadata
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#issuecomment-589546842
@bvaradar please check prior to merging if the commit message will retain the JIRA number :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] n3nash edited a comment on issue #1278: [HUDI-573]
Refactoring getter to avoid double extrametadata in json representation of
HoodieCommitMetadata
Posted by GitBox <gi...@apache.org>.
n3nash edited a comment on issue #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#issuecomment-578251544
When ObjectMapper converts a POJO to json string, it generates getters for the variables in the class as well as pulls the existing getters in the class after which it will stitch together the object to string. In HoodieCommitMetadata, there is a variable extraMetadataMap and also a getter method getExtraMetadata(). During serialization, there will be 2 object to string conversions 1) getExtraMetadata() 2) getExtraMetadataMap() (created by ObjectMapper) that results in 2 copies of the same data in JSON.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] n3nash commented on a change in pull request
#1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json
representation of HoodieCommitMetadata
Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#discussion_r371418845
##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##########
@@ -47,15 +47,15 @@
protected Map<String, List<HoodieWriteStat>> partitionToWriteStats;
protected Boolean compacted;
- private Map<String, String> extraMetadataMap;
+ private Map<String, String> extraMetadata;
Review comment:
So the change I have done doesn't change AVRO generated pojo, so that should work just fine.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-hudi] n3nash commented on issue #1278: [HUDI-573]
Refactoring getter to avoid double extrametadata in json representation of
HoodieCommitMetadata
Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #1278: [HUDI-573] Refactoring getter to avoid double extrametadata in json representation of HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1278#issuecomment-578251544
When ObjectMapper converts a POJO to json string, it generates getters for the variables in the class as well as pulls the existing getters in the class after which it will stitch together the object to string. In HoodieCommitMetadata, there is a variable extraMetadataMap and also a getting getExtraMetadata(). During serialization, there will be 2 object to string conversions 1) getExtraMetadata() 2) getExtraMetadataMap() (created by ObjectMapper) that results in 2 copies of the same data in JSON.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services