You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "amogh-jahagirdar (via GitHub)" <gi...@apache.org> on 2023/04/24 16:20:05 UTC

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #7416: Views: Clean up and clarify the view spec

amogh-jahagirdar commented on code in PR #7416:
URL: https://github.com/apache/iceberg/pull/7416#discussion_r1175521596


##########
format/view-spec.md:
##########
@@ -55,211 +56,275 @@ Writers create view metadata files optimistically, assuming that the current met
 
 The view version metadata file has the following fields:
 
-| Required/Optional | Field Name | Description |
-|-------------------|------------|-------------|
-| Required | format-version | An integer version number for the view format. Currently, this must be 1. Implementations must throw an exception if the view's version is higher than the supported version. |
-| Required | location | The view's base location. This is used to determine where to store view metadata files. |
-| Required | current-version-id | Current version of the view. Set to ‘1’ when the view is first created. |
-| Optional | properties | A string to string map of view properties. This is used for metadata such as "comment" and for settings that affect view maintenance. This is not intended to be used for arbitrary metadata. |
-| Required | versions | An array of structs describing the known versions of the view. The number of versions to retain is controlled by the table property: “version.history.num-entries”. See section [Versions](#versions). |
-| Required | version-log | A list of timestamp and version ID pairs that encodes changes to the current version for the view. Each time the current-version-id is changed, a new entry should be added with the last-updated-ms and the new current-version-id. |
-| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from Iceberg table spec. |
-| Optional | current-schema-id | ID of the current schema of the view |
+| Requirement | Field name           | Description |
+|-------------|----------------------|-------------|
+| _required_  | `format-version`     | An integer version number for the view format; must be 1 |
+| _required_  | `location`           | The view's base location; used to create metadata file locations |
+| _optional_  | `current-schema-id`  | ID of the current schema of the view, if known |
+| _optional_  | `schemas`            | A list of known schemas |
+| _required_  | `current-version-id` | ID of the current version of the view (`version-id`) |
+| _required_  | `versions`           | A list of known [versions](#versions) of the view [1] |
+| _required_  | `version-log`        | A list of [version log](#version-log) entries with the timestamp and `version-id` for every change to `current-version-id` |
+| _optional_  | `properties`         | A string to string map of view properties [2] |
+
+Notes:
+1. The number of versions to retain is controlled by the table property: `version.history.num-entries`.
+2. Properties are used for metadata such as `comment` and for settings that affect view maintenance. This is not intended to be used for arbitrary metadata.
 
 #### Versions
 
-Field "versions" is an array of structs with the following fields:
+Each version in `versions` is a struct with the following fields:
 
-| Required/Optional | Field Name | Description |
-|-------------------|------------|-------------|
-| Required | version-id | Monotonically increasing id indicating the version of the view. Starts with 1. |
-| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the version of the view was created. |
-| Required | summary | A string map summarizes the version changes, including `operation`, described in [Summary](#summary). |
-| Required | representations | A list of "representations" as described in [Representations](#representations). |
+| Requirement | Field name        | Description |
+|-------------|-------------------|-------------|
+| _required_  | `version-id`      | ID for the version |
+| _required_  | `timestamp-ms`    | Timestamp when the version was created (ms from epoch) |
+| _required_  | `summary`         | A string to string map of [summary metadata](#summary) about the version |
+| _required_  | `representations` | A list of [representations](#representations) for the view definition |
 
-#### Version Log
+#### Summary
 
-Field “version-log” is an array of structs that describe when each version was considered "current". Creation time is different and is stored in each version's metadata. This allows you to reconstruct what someone would have seen at some point in time. If the view has been updated and rolled back, this will show it. The struct has the following fields:
+Summary is a string to string map of metadata about a view version. Common metadata keys are documented here.
 
-| Required/Optional | Field Name | Description |
-|-------------------|------------|-------------|
-| Required | timestamp-ms | The timestamp when the referenced version was made the current version |
-| Required | version-id | Version id of the view  |
+| Requirement | Key              | Value |
+|-------------|------------------|-------|
+| _required_  | `operation`      | Operation that caused this metadata to be created; must be `create` or `replace` |
+| _optional_  | `engine-name`    | Name of the engine that created the view version |
+| _optional_  | `engine-version` | Version of the engine that created the view version |
 
-#### Summary
+#### Representations
 
-Field “summary” is a string map with the following keys. Only `operation` is required. Engines may store additional key-value pairs in this map.
+View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.
 
-| Required/Optional | Key | Value |
-|-------------------|-----|-------|
-| Required | operation | A string value indicating the view operation that caused this metadata to be created. Allowed values are “create” and “replace”. |
-| Optional | engine-version | A string value indicating the version of the engine that performed the operation |
+A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
 
-#### Representations
+Each representation is an object with at least one common field, `type`, that is one of the following:
+* `sql`: a SQL SELECT statement that defines the view
+
+Representations further define metadata for each type.
+
+##### SQL representation
+
+The SQL representation stores the view definition as a SQL SELECT, with metadata such as the SQL dialect.
+
+| Requirement | Field name          | Type           | Description |
+|-------------|---------------------|----------------|-------------|
+| _required_  | `type`              | `string`       | Must be `sql` |
+| _required_  | `sql`               | `string`       | A SQL SELECT statement |
+| _required_  | `dialect`           | `string`       | The dialect of the `sql` SELECT statement (e.g., "trino" or "spark") |
+| _optional_  | `schema-id`         | `int`          | ID of the schema produced by the SELECT statement |
+| _optional_  | `default-catalog`   | `string`       | Catalog name to use when a reference in the SELECT does not contain a catalog |
+| _optional_  | `default-namespace` | `list<string>` | Namespace to use when a reference in the SELECT is a single identifier |
+| _optional_  | `field-aliases`     | `list<string>` | Column names optionally specified in the create statement |
+| _optional_  | `field-docs`        | `list<string>` | Column descriptions (COMMENT) optionally specified in the create statement |

Review Comment:
   Good catch, I think I'd prefer to change the spec to be `field-comments`. My rationale is that most operations (at least via SQL) will actually explicitly refer to the word "comment" so it seems more natural. 
   
   for example `CREATE VIEW v (alias_name COMMENT 'docs', alias_name2, ...) AS SELECT col1, col2, ...`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org