You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/28 09:48:52 UTC

[GitHub] [iceberg] Zhangg7723 commented on a diff in pull request #4945: Add table spec changes for statistics information in table snapshot

Zhangg7723 commented on code in PR #4945:
URL: https://github.com/apache/iceberg/pull/4945#discussion_r908278654


##########
format/spec.md:
##########
@@ -486,16 +486,17 @@ When reading v1 manifests with no sequence number column, sequence numbers for a
 
 A snapshot consists of the following fields:
 
-| v1         | v2         | Field                    | Description |
-| ---------- | ---------- | ------------------------ | ----------- |
-| _required_ | _required_ | **`snapshot-id`**        | A unique long ID |
-| _optional_ | _optional_ | **`parent-snapshot-id`** | The snapshot ID of the snapshot's parent. Omitted for any snapshot with no parent |
-|            | _required_ | **`sequence-number`**    | A monotonically increasing long that tracks the order of changes to a table |
-| _required_ | _required_ | **`timestamp-ms`**       | A timestamp when the snapshot was created, used for garbage collection and table inspection |
-| _optional_ | _required_ | **`manifest-list`**      | The location of a manifest list for this snapshot that tracks manifest files with additional metadata |
-| _optional_ |            | **`manifests`**          | A list of manifest file locations. Must be omitted if `manifest-list` is present |
-| _optional_ | _required_ | **`summary`**            | A string map that summarizes the snapshot changes, including `operation` (see below) |
-| _optional_ | _optional_ | **`schema-id`**          | ID of the table's current schema when the snapshot was created |
+| v1         | v2         | Field                    | Description                                                                                                                                                     |
+| ---------- | ---------- | ------------------------ |-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| _required_ | _required_ | **`snapshot-id`**        | A unique long ID                                                                                                                                                |
+| _optional_ | _optional_ | **`parent-snapshot-id`** | The snapshot ID of the snapshot's parent. Omitted for any snapshot with no parent                                                                               |
+|            | _required_ | **`sequence-number`**    | A monotonically increasing long that tracks the order of changes to a table                                                                                     |
+| _required_ | _required_ | **`timestamp-ms`**       | A timestamp when the snapshot was created, used for garbage collection and table inspection                                                                     |
+| _optional_ | _required_ | **`manifest-list`**      | The location of a manifest list for this snapshot that tracks manifest files with additional metadata                                                           |
+| _optional_ |            | **`manifests`**          | A list of manifest file locations. Must be omitted if `manifest-list` is present                                                                                |
+| _optional_ | _required_ | **`summary`**            | A string map that summarizes the snapshot changes, including `operation` (see below)                                                                            |
+| _optional_ | _optional_ | **`schema-id`**          | ID of the table's current schema when the snapshot was created                                                                                                  |
+| _optional_ | _optional_ | **`statistics`**         | A [statistics file's metadata](#statistics-file). The field should be retained by writers, unless writer updates the statistics, or knows they became obsolete. |

Review Comment:
   +1 for attaching stats file to table level, stats data needs to be re-calculated for each snapshot that insert or update data, the history stats files is only used for time travel query and not useful next calculation, it seems to be lower value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org