You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/14 07:17:18 UTC
[GitHub] [iceberg] shardulm94 opened a new pull request #2694: Core: Validate user provided split planning configs
shardulm94 opened a new pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694
One of our Spark apps using Iceberg started reporting huge GCs and eventually getting killed by YARN due to OOM. Looking at jmap we found that Iceberg was creating too many scan tasks
```
num #instances #bytes class name
----------------------------------------------
1: 29334684 938709888 org.apache.iceberg.BaseFileScanTask$SplitScanTask
2: 29334673 704032152 [Lorg.apache.iceberg.FileScanTask;
3: 29334673 469354768 org.apache.iceberg.BaseCombinedScanTask
4: 8964 125336304 [Ljava.lang.Object;
5: 47486 7061560 [C
6: 15733 1723088 java.lang.Class
```
Turns out this was because of an integer overflow when user passed in `split-size`. User provided value was `2048 * 1024 * 1024` which was converted to `-2147483648`. Since it really does not make sense for these split planning parameters to be negative, I added some checks. One thing I am not too sure on, should we allow split open file cost to be negative? Since it's a cost, it can theoretically be negative but I don't see any practical use case for that.
Thanks @venkata91, who initially figured out the issue in our ecosystem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a change in pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#discussion_r650441888
##########
File path: core/src/main/java/org/apache/iceberg/BaseTableScan.java
##########
@@ -223,20 +223,27 @@ public TableScan ignoreResiduals() {
} else {
splitSize = targetSplitSize();
}
+ Preconditions.checkArgument(splitSize > 0, "Split size should be greater than zero. Found: %s", splitSize);
Review comment:
We normally phrase exception messages like "Invalid split size (negative): %s". That says basically the same thing, but is shorter and follows the conventions we use elsewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] aokolnychyi commented on pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870874099
It looks like this does not handle use cases where we call `TableScanUtil` directly. Shall we move validation there?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] aokolnychyi commented on pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870876552
I think it is best to have this fix in a separate PR. @shardulm94 @rdblue, what do you think about moving the checks into the util?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-869874709
@SreeramGarlapati FYI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] SreeramGarlapati commented on a change in pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
SreeramGarlapati commented on a change in pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#discussion_r661888167
##########
File path: core/src/main/java/org/apache/iceberg/BaseTableScan.java
##########
@@ -223,20 +223,25 @@ public TableScan ignoreResiduals() {
} else {
splitSize = targetSplitSize();
}
+ Preconditions.checkArgument(splitSize > 0, "Invalid split size (negative): %s", splitSize);
+
int lookback;
if (options.containsKey(TableProperties.SPLIT_LOOKBACK)) {
lookback = Integer.parseInt(options.get(TableProperties.SPLIT_LOOKBACK));
} else {
lookback = ops.current().propertyAsInt(
TableProperties.SPLIT_LOOKBACK, TableProperties.SPLIT_LOOKBACK_DEFAULT);
}
+ Preconditions.checkArgument(lookback > 0, "Invalid split planning lookback (negative): %s", lookback);
Review comment:
nit: `non-positive`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] RussellSpitzer commented on pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870875705
Yep i'm doing this in the Compaction PR (since we ran into similar issues when testing compaction) and put the checks in TableScanUtil after review in that pr
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] RussellSpitzer commented on pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870908617
https://github.com/apache/iceberg/pull/2759 - Split out of Compaction PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on a change in pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#discussion_r654570127
##########
File path: core/src/main/java/org/apache/iceberg/BaseTableScan.java
##########
@@ -223,20 +223,27 @@ public TableScan ignoreResiduals() {
} else {
splitSize = targetSplitSize();
}
+ Preconditions.checkArgument(splitSize > 0, "Split size should be greater than zero. Found: %s", splitSize);
Review comment:
@shardulm94 could you update the error messages here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue closed pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] aokolnychyi commented on pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-870872659
@RussellSpitzer, is it similar to what you have in compaction?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-872533382
Closing this. It was fixed in #2759.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] rdblue commented on pull request #2694: Core: Validate user provided split planning configs
Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2694:
URL: https://github.com/apache/iceberg/pull/2694#issuecomment-869874709
@SreeramGarlapati FYI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org