You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/21 16:41:43 UTC
[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

rdblue commented on a change in pull request #1221:
URL: https://github.com/apache/iceberg/pull/1221#discussion_r458238324



##########
File path: site/docs/configuration.md
##########
@@ -109,14 +110,14 @@ spark.read
     .table("catalog.db.table")
 ```
 
-| Spark option    | Default               | Description                                                                               |
-| --------------- | --------------------- | ----------------------------------------------------------------------------------------- |
-| snapshot-id     | (latest)              | Snapshot ID of the table snapshot to read                                                 |
-| as-of-timestamp | (latest)              | A timestamp in milliseconds; the snapshot used will be the snapshot current at this time. |
-| split-size      | As per table property | Overrides this table's read.split.target-size and read.split.metadata-target-size         |
-| lookback        | As per table property | Overrides this table's read.split.planning-lookback                                       |
-| file-open-cost  | As per table property | Overrides this table's read.split.open-file-cost                                          |
-
+| Spark option               | Default               | Description                                                                               |
+| -------------------------- | --------------------- | ----------------------------------------------------------------------------------------- |
+| snapshot-id                | (latest)              | Snapshot ID of the table snapshot to read                                                 |
+| as-of-timestamp            | (latest)              | A timestamp in milliseconds; the snapshot used will be the snapshot current at this time. |
+| split-size                 | As per table property | Overrides this table's read.split.target-size and read.split.metadata-target-size         |
+| lookback                   | As per table property | Overrides this table's read.split.planning-lookback                                       |
+| file-open-cost             | As per table property | Overrides this table's read.split.open-file-cost                                          |
+| use-approximate-statistics | As per table property | Overrides this table's read.spark.read.spark.use-approximate-statistics                   |

Review comment:
       I don't think we need a table option for this. If we were going to return incorrect stats, then I would want a flag to enable or disable it. But because we are going to use table-level stats, we can detect when to do it based on whether or not there are filters. No filter, then use table level stats.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org