You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/27 23:49:14 UTC

[GitHub] [iceberg] kbendick opened a new pull request #1991: Avro: Schema conversions should preserve docs

kbendick opened a new pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991


   This addresses issue https://github.com/apache/iceberg/issues/1954


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #1991: [WIP] Avro: Schema conversions should preserve docs

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#issuecomment-751543148


   So the issue with the merging manifests test appears to be that the test in question uses a hardcoded value for `commit.manifest.target-size-bytes` based on a size expected _without doc comments_.Because the manifest files have doc comments in them, the overall size of each manifest increases and that affects the merging behavior as outlined in the test. 
   
   It's quite possible that we don't want to make this change (or at least that we don't want the doc comments in the manifest files) due to the increase in size of manifest files. I will let others weigh in on that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on pull request #1991: Avro: Schema conversions should preserve field docs

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#issuecomment-751543148


   I've added a new test to ensure doc comments are preserved in a round trip conversion. The other test that was touched, `TestMergeAppend#testManifestMergeMinCount`, needed to be updated to account for the slight increase in manifest file size due to the field doc comments that now appear one time in the header.
   
   I also updated the language surrounding the choice of the new `commit.manifest.target-size-bytes` used in the test to account for both v1/v2 sizes. I tried to make the language match the style of what was already there, but feel free to suggest something simpler.
   
   Given that the header is only once per manifest file and the default target size is 8mb, the small increase in manifest file size is well worth the benefit of keeping the field docs. I will let others weigh in on that if they are so inclined.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1991: Avro: Schema conversions should preserve docs

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#discussion_r549226737



##########
File path: core/src/test/java/org/apache/iceberg/TestMergeAppend.java
##########
@@ -239,8 +239,10 @@ public void testMergeWithExistingManifest() {
   public void testManifestMergeMinCount() throws IOException {
     Assert.assertEquals("Table should start empty", 0, listManifestFiles().size());
     table.updateProperties().set(TableProperties.MANIFEST_MIN_MERGE_COUNT, "2")
-        // each manifest file is 5227 bytes, so 12000 bytes limit will give us 2 bins with 3 manifest/data files.
-        .set(TableProperties.MANIFEST_TARGET_SIZE_BYTES, "12000")
+        // Each initial v1/v2 ManifestFile is 5661/6397 bytes respectively. Merging two of the given
+        // manifests make one v1/v2 ManifestFile of 5672/6408 bytes respectively, so 12850 bytes
+        // limit will give us two bins with three manifest/data files.
+        .set(TableProperties.MANIFEST_TARGET_SIZE_BYTES, "12850")

Review comment:
       Technically this limit needs to be at least 12817 bytes.
   
   That's one byte larger than the number of bins used in the tests * the size of the largest manifest file in either test run (v1 or v2, v2 is slightly larger) for the set of operations defined in the test to meet their assertions, but I chose a value slightly larger / rounded value to match somewhat how the test was previously set up.
   
   Feel free to suggest a different value. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick removed a comment on pull request #1991: Avro: Schema conversions should preserve docs

Posted by GitBox <gi...@apache.org>.
kbendick removed a comment on pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#issuecomment-751580575


   I took a look at the default value for the manifest's target size (8mb) and realized the slight increase in size due to the doc comments in the header of the manifest files aren't really that big of a deal.
   
   I'll clean up the test that was failing (which has the `TODO` comments on it) in the morning and then remove the `WIP` from this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #1991: [WIP] Avro: Schema conversions should preserve docs

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#issuecomment-751580575


   I took a look at the default value for the manifest's target size (8mb) and realized the slight increase in size due to the doc comments in the header of the manifest files aren't really that big of a deal.
   
   I'll clean up the test that was failing (which has the `TODO` comments on it) in the morning and then remove the `WIP` from this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on pull request #1991: [WIP] Avro: Schema conversions should preserve docs

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#issuecomment-751543148


   So the issue with the merging manifests test appears to be that the test in question uses a hardcoded value for `commit.manifest.target-size-bytes` based on a size expected _without doc comments_. Because the manifest fields have doc comments in them, the overall size of each manifest increases and that affects the merging behavior as outlined in the test. 
   
   It's quite possible that we don't want to make this change (or at least that we don't want the doc comments in the manifest files) due to the increase in size of manifest files. I will let others weigh in on that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1991: Avro: Schema conversions should preserve field docs

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#discussion_r549477920



##########
File path: core/src/test/java/org/apache/iceberg/TestMergeAppend.java
##########
@@ -239,8 +239,10 @@ public void testMergeWithExistingManifest() {
   public void testManifestMergeMinCount() throws IOException {
     Assert.assertEquals("Table should start empty", 0, listManifestFiles().size());
     table.updateProperties().set(TableProperties.MANIFEST_MIN_MERGE_COUNT, "2")
-        // each manifest file is 5227 bytes, so 12000 bytes limit will give us 2 bins with 3 manifest/data files.
-        .set(TableProperties.MANIFEST_TARGET_SIZE_BYTES, "12000")
+        // Each initial v1/v2 ManifestFile is 5661/6397 bytes respectively. Merging two of the given
+        // manifests make one v1/v2 ManifestFile of 5672/6408 bytes respectively, so 12850 bytes
+        // limit will give us two bins with three manifest/data files.
+        .set(TableProperties.MANIFEST_TARGET_SIZE_BYTES, "12850")

Review comment:
       I'll open a patch to up this to 15k 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1991: Avro: Schema conversions should preserve field docs

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on pull request #1991: Avro: Schema conversions should preserve docs

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#issuecomment-751543148


   So the issue with the merging manifests test appears to be that the test in question uses a hardcoded value for `commit.manifest.target-size-bytes` based on a size expected _without doc comments_. Because the manifest fields have doc comments in them, the overall size of each manifest increases and that affects the merging behavior as outlined in the test. 
   
   It's quite possible that we don't want to make this change (or at least that we don't want the doc comments in the manifest files) due to the increase in size of manifest files. I will let others weigh in on that. I think the trade off is worth the benefit of keeping the field docs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1991: Avro: Schema conversions should preserve field docs

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#discussion_r549481411



##########
File path: core/src/test/java/org/apache/iceberg/TestMergeAppend.java
##########
@@ -239,8 +239,10 @@ public void testMergeWithExistingManifest() {
   public void testManifestMergeMinCount() throws IOException {
     Assert.assertEquals("Table should start empty", 0, listManifestFiles().size());
     table.updateProperties().set(TableProperties.MANIFEST_MIN_MERGE_COUNT, "2")
-        // each manifest file is 5227 bytes, so 12000 bytes limit will give us 2 bins with 3 manifest/data files.
-        .set(TableProperties.MANIFEST_TARGET_SIZE_BYTES, "12000")
+        // Each initial v1/v2 ManifestFile is 5661/6397 bytes respectively. Merging two of the given
+        // manifests make one v1/v2 ManifestFile of 5672/6408 bytes respectively, so 12850 bytes
+        // limit will give us two bins with three manifest/data files.
+        .set(TableProperties.MANIFEST_TARGET_SIZE_BYTES, "12850")

Review comment:
       Opened a PR for the suggested 15k value: https://github.com/apache/iceberg/pull/1999




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1991: Avro: Schema conversions should preserve field docs

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#discussion_r549450197



##########
File path: core/src/test/java/org/apache/iceberg/TestMergeAppend.java
##########
@@ -239,8 +239,10 @@ public void testMergeWithExistingManifest() {
   public void testManifestMergeMinCount() throws IOException {
     Assert.assertEquals("Table should start empty", 0, listManifestFiles().size());
     table.updateProperties().set(TableProperties.MANIFEST_MIN_MERGE_COUNT, "2")
-        // each manifest file is 5227 bytes, so 12000 bytes limit will give us 2 bins with 3 manifest/data files.
-        .set(TableProperties.MANIFEST_TARGET_SIZE_BYTES, "12000")
+        // Each initial v1/v2 ManifestFile is 5661/6397 bytes respectively. Merging two of the given
+        // manifests make one v1/v2 ManifestFile of 5672/6408 bytes respectively, so 12850 bytes
+        // limit will give us two bins with three manifest/data files.
+        .set(TableProperties.MANIFEST_TARGET_SIZE_BYTES, "12850")

Review comment:
       3x the smaller size would be around 17k, and we need it to be at least about 13k, which is 2x the larger size. I'd probably set this to 15k to split the difference and hopefully avoid needing to update this again as tests change. This is minor, though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1991: Avro: Schema conversions should preserve field docs

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1991:
URL: https://github.com/apache/iceberg/pull/1991#issuecomment-751826454


   Thanks, @kbendick! Looks good to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org