You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/20 21:50:00 UTC

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: ISSUE-1220: add option to disable manifest reading during estimateSta…

rdblue commented on a change in pull request #1221:
URL: https://github.com/apache/iceberg/pull/1221#discussion_r457711539



##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -276,6 +280,9 @@ public void pruneColumns(StructType newRequestedSchema) {
 
   @Override
   public Statistics estimateStatistics() {
+    if(disableEstimateStatistics) {
+      return new Stats(Long.MAX_VALUE, Long.MAX_VALUE);
+    }

Review comment:
       I just had an idea for an alternative solution to this. What about detecting that there are no filters and instead returning a value based on the [`total-records`](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L37) value in snapshot metadata?
   
   Usually, estimating stats based on the number of rows and a guess for the size of a row is much better than using the actual size anyway. So if you can get the number of rows and come up with an estimate for the size of each row based on the table schema, then you wouldn't need to disable stats at all.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org