You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/17 21:55:17 UTC

[GitHub] [iceberg] RussellSpitzer opened a new pull request #2600: Refactor of RewriteDataFiles and BinPackStrategy

RussellSpitzer opened a new pull request #2600:
URL: https://github.com/apache/iceberg/pull/2600


   Remove explicit staregy api in Rewrite Data Files APi so that we can more easily add strategies with custom arguements, such as "sort(SortOrder ...)
   
   Change BinPack Defaults and organization to better handle small and large files
   Removes the output size parameter and replaces it with a constant check clause to see if the output of the rewrite would produce a target size file or the input exceeds the MIN_INPUT_FILES param


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2600: Refactor of RewriteDataFiles and BinPackStrategy

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2600:
URL: https://github.com/apache/iceberg/pull/2600#discussion_r633900116



##########
File path: core/src/main/java/org/apache/iceberg/actions/BinPackStrategy.java
##########
@@ -45,20 +44,12 @@
 abstract class BinPackStrategy implements RewriteStrategy {
 
   /**
-   * Minimum number of files that need to be in a file group to be considered
-   * for rewriting. This is considered in conjunction with {@link MIN_OUTPUT_FILES}, both
-   * conditions must pass to consider a group of files to be rewritten.
+   * The minimum number of files that need to be in a file group for it to be considered for
+   * compaction if the total size of that group is not the target size. This can also be thought of as
+   * the maximum number of non-target-size files that should remain in a file group (parititon) after rewriting.

Review comment:
       Also typo on partition, I see now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #2600: Refactor of RewriteDataFiles and BinPackStrategy

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #2600:
URL: https://github.com/apache/iceberg/pull/2600


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2600: Refactor of RewriteDataFiles and BinPackStrategy

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2600:
URL: https://github.com/apache/iceberg/pull/2600#discussion_r633898217



##########
File path: core/src/main/java/org/apache/iceberg/actions/BinPackStrategy.java
##########
@@ -45,20 +44,12 @@
 abstract class BinPackStrategy implements RewriteStrategy {
 
   /**
-   * Minimum number of files that need to be in a file group to be considered
-   * for rewriting. This is considered in conjunction with {@link MIN_OUTPUT_FILES}, both
-   * conditions must pass to consider a group of files to be rewritten.
+   * The minimum number of files that need to be in a file group for it to be considered for

Review comment:
       Looks like we have to update the Javadoc for the class as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2600: Refactor of RewriteDataFiles and BinPackStrategy

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2600:
URL: https://github.com/apache/iceberg/pull/2600#issuecomment-842712499


   Thanks, @RussellSpitzer!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2600: Refactor of RewriteDataFiles and BinPackStrategy

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2600:
URL: https://github.com/apache/iceberg/pull/2600#discussion_r633899566



##########
File path: core/src/main/java/org/apache/iceberg/actions/BinPackStrategy.java
##########
@@ -45,20 +44,12 @@
 abstract class BinPackStrategy implements RewriteStrategy {
 
   /**
-   * Minimum number of files that need to be in a file group to be considered
-   * for rewriting. This is considered in conjunction with {@link MIN_OUTPUT_FILES}, both
-   * conditions must pass to consider a group of files to be rewritten.
+   * The minimum number of files that need to be in a file group for it to be considered for
+   * compaction if the total size of that group is not the target size. This can also be thought of as
+   * the maximum number of non-target-size files that should remain in a file group (parititon) after rewriting.

Review comment:
       nit: `should` -> `can`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2600: Refactor of RewriteDataFiles and BinPackStrategy

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2600:
URL: https://github.com/apache/iceberg/pull/2600#discussion_r633898541



##########
File path: core/src/main/java/org/apache/iceberg/actions/BinPackStrategy.java
##########
@@ -45,20 +44,12 @@
 abstract class BinPackStrategy implements RewriteStrategy {
 
   /**
-   * Minimum number of files that need to be in a file group to be considered
-   * for rewriting. This is considered in conjunction with {@link MIN_OUTPUT_FILES}, both
-   * conditions must pass to consider a group of files to be rewritten.
+   * The minimum number of files that need to be in a file group for it to be considered for
+   * compaction if the total size of that group is not the target size. This can also be thought of as

Review comment:
       `... if the total file group size is less than the target file size`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2600: Refactor of RewriteDataFiles and BinPackStrategy

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2600:
URL: https://github.com/apache/iceberg/pull/2600#discussion_r633899916



##########
File path: core/src/main/java/org/apache/iceberg/actions/BinPackStrategy.java
##########
@@ -45,20 +44,12 @@
 abstract class BinPackStrategy implements RewriteStrategy {
 
   /**
-   * Minimum number of files that need to be in a file group to be considered
-   * for rewriting. This is considered in conjunction with {@link MIN_OUTPUT_FILES}, both
-   * conditions must pass to consider a group of files to be rewritten.
+   * The minimum number of files that need to be in a file group for it to be considered for
+   * compaction if the total size of that group is not the target size. This can also be thought of as
+   * the maximum number of non-target-size files that should remain in a file group (parititon) after rewriting.

Review comment:
       I couldn't say why but I think it should be "could" ... replaced




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org