You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/14 07:17:18 UTC

[GitHub] [iceberg] shardulm94 opened a new pull request #2694: Core: Validate user provided split planning configs

shardulm94 opened a new pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694


   One of our Spark apps using Iceberg started reporting huge GCs and eventually getting killed by YARN due to OOM. Looking at jmap we found that Iceberg was creating too many scan tasks
   ```
   num     #instances         #bytes  class name
   ----------------------------------------------
      1:      29334684      938709888  org.apache.iceberg.BaseFileScanTask$SplitScanTask
      2:      29334673      704032152  [Lorg.apache.iceberg.FileScanTask;
      3:      29334673      469354768  org.apache.iceberg.BaseCombinedScanTask
      4:          8964      125336304  [Ljava.lang.Object;
      5:         47486        7061560  [C
      6:         15733        1723088  java.lang.Class
    ```
    Turns out this was because of an integer overflow when user passed in `split-size`. User provided value was `2048 * 1024 * 1024` which was converted to `-2147483648`. Since it really does not make sense for these split planning parameters to be negative, I added some checks. One thing I am not too sure on, should we allow split open file cost to be negative? Since it's a  cost, it can theoretically be negative but I don't see any practical use case for that.
    
   Thanks @venkata91, who initially figured out the issue in our ecosystem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#discussion_r650441888



##########
File path: core/src/main/java/org/apache/iceberg/BaseTableScan.java
##########
@@ -223,20 +223,27 @@ public TableScan ignoreResiduals() {
     } else {
       splitSize = targetSplitSize();
     }
+    Preconditions.checkArgument(splitSize > 0, "Split size should be greater than zero. Found: %s", splitSize);

Review comment:
       We normally phrase exception messages like "Invalid split size (negative): %s". That says basically the same thing, but is shorter and follows the conventions we use elsewhere.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870874099


   It looks like this does not handle use cases where we call `TableScanUtil` directly. Shall we move validation there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870876552


   I think it is best to have this fix in a separate PR. @shardulm94 @rdblue, what do you think about moving the checks into the util? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-869874709


   @SreeramGarlapati FYI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SreeramGarlapati commented on a change in pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
SreeramGarlapati commented on a change in pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#discussion_r661888167



##########
File path: core/src/main/java/org/apache/iceberg/BaseTableScan.java
##########
@@ -223,20 +223,25 @@ public TableScan ignoreResiduals() {
     } else {
       splitSize = targetSplitSize();
     }
+    Preconditions.checkArgument(splitSize > 0, "Invalid split size (negative): %s", splitSize);
+
     int lookback;
     if (options.containsKey(TableProperties.SPLIT_LOOKBACK)) {
       lookback = Integer.parseInt(options.get(TableProperties.SPLIT_LOOKBACK));
     } else {
       lookback = ops.current().propertyAsInt(
           TableProperties.SPLIT_LOOKBACK, TableProperties.SPLIT_LOOKBACK_DEFAULT);
     }
+    Preconditions.checkArgument(lookback > 0, "Invalid split planning lookback (negative): %s", lookback);

Review comment:
       nit: `non-positive`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870875705


   Yep i'm doing this in the Compaction PR (since we ran into similar issues when testing compaction) and put the checks in TableScanUtil after review in that pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870908617


   https://github.com/apache/iceberg/pull/2759 - Split out of Compaction PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#discussion_r654570127



##########
File path: core/src/main/java/org/apache/iceberg/BaseTableScan.java
##########
@@ -223,20 +223,27 @@ public TableScan ignoreResiduals() {
     } else {
       splitSize = targetSplitSize();
     }
+    Preconditions.checkArgument(splitSize > 0, "Split size should be greater than zero. Found: %s", splitSize);

Review comment:
       @shardulm94 could you update the error messages here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870872659


   @RussellSpitzer, is it similar to what you have in compaction?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-872533382


   Closing this. It was fixed in #2759.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2694: Core: Validate user provided split planning configs

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-869874709


   @SreeramGarlapati FYI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org