You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "wmoustafa (via GitHub)" <gi...@apache.org> on 2023/05/02 17:20:16 UTC

[GitHub] [iceberg] wmoustafa opened a new pull request, #7500: Views: Update spec with expectations on versions, representations, and dialects

wmoustafa opened a new pull request, #7500:
URL: https://github.com/apache/iceberg/pull/7500

   Update view spec with expectations on versions, representations, and dialects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1183139302


##########
format/view-spec.md:
##########
@@ -96,11 +96,15 @@ Summary is a string to string map of metadata about a view version. Common metad
 
 View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
+A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation(s) to use.

Review Comment:
   Let's roll this change back since it was more clear before. Engines should not use more than one representation. And if one does, it doesn't need to be accounted for here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1185611553


##########
format/view-spec.md:
##########
@@ -96,11 +96,15 @@ Summary is a string to string map of metadata about a view version. Common metad
 
 View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
+A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation(s) to use.
+
+View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
 
 Each representation is an object with at least one common field, `type`, that is one of the following:
 * `sql`: a SQL SELECT statement that defines the view
 
+In addition to `type`, each representation defines a `dialect`. A `dialect` is a string that identifies the query language dialect used in the representation. For example, `trino` or `spark`. A view version cannot have duplicate representations with the same `type` and `dialect`.

Review Comment:
   I think what applies to `type` alone should extend to the `type` + `dialect` pair. So if initially we considered multiple `types`, then probably multiple `type` + `dialect` pairs is as valid, unless there is something specific that makes just multiple `types` acceptable, but not multiple `dialects`. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1183140556


##########
format/view-spec.md:
##########
@@ -96,11 +96,15 @@ Summary is a string to string map of metadata about a view version. Common metad
 
 View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
+A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation(s) to use.
+
+View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
 
 Each representation is an object with at least one common field, `type`, that is one of the following:
 * `sql`: a SQL SELECT statement that defines the view
 
+In addition to `type`, each representation defines a `dialect`. A `dialect` is a string that identifies the query language dialect used in the representation. For example, `trino` or `spark`. A view version cannot have duplicate representations with the same `type` and `dialect`.

Review Comment:
   This isn't true. Representations do not necessarily have a dialect. SQL does because it varies. I would not expect other specifications to have the same variance.
   
   Instead, I think it would be better to state that there should be only one of a give type. Did you want to allow multiple SQL representations with different dialects?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1191476519


##########
format/view-spec.md:
##########
@@ -98,7 +98,11 @@ View definitions can be represented in multiple ways. Representations are docume
 
 A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
 
-Each representation is an object with at least one common field, `type`, that is one of the following:
+View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
+
+A representation is identified by its `type` and `dialect`. A view version can have multiple representations (i.e., different `type` and `dialect` combinations), but at most one representation for a given `type` and `dialect` pair.

Review Comment:
   This is incorrect because there is no dialect. A `dialect` is only present in the SQL representation. It can be referenced in that section but not here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue merged PR #7500:
URL: https://github.com/apache/iceberg/pull/7500


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1183980083


##########
format/view-spec.md:
##########
@@ -96,11 +96,15 @@ Summary is a string to string map of metadata about a view version. Common metad
 
 View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
+A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation(s) to use.
+
+View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
 
 Each representation is an object with at least one common field, `type`, that is one of the following:
 * `sql`: a SQL SELECT statement that defines the view
 
+In addition to `type`, each representation defines a `dialect`. A `dialect` is a string that identifies the query language dialect used in the representation. For example, `trino` or `spark`. A view version cannot have duplicate representations with the same `type` and `dialect`.

Review Comment:
   > This isn't true. Representations do not necessarily have a dialect. SQL does because it varies.
   
   The `dialect` field is required. Did not notice something in the spec that says it was not true. 
   
   > I would not expect other specifications to have the same variance.
   
   I was thinking of other languages such as `Datalog`, `SPARQL`, `Cypher`. Those could have dialects too.
   
   > Instead, I think it would be better to state that there should be only one of a give type. Did you want to allow multiple SQL representations with different dialects?
   
   I was actually under the impression that the spec allows multiple SQL representations with different dialects. We can discuss the pros and cons of each option. However, it sounds if allowing multiple types is allowed, then allowing multiple dialects within the type could be allowed as well. A third option is to strictly allow one representation (one `type` and one `dialect`) to guarantee a canonical, SOT representation (I realize this is a big change to the spec, but just putting it here since we are already discussing two other options).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1193231849


##########
format/view-spec.md:
##########
@@ -98,7 +98,11 @@ View definitions can be represented in multiple ways. Representations are docume
 
 A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
 
-Each representation is an object with at least one common field, `type`, that is one of the following:
+View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
+
+A representation is identified by its `type` and `dialect`. A view version can have multiple representations (i.e., different `type` and `dialect` combinations), but at most one representation for a given `type` and `dialect` pair.

Review Comment:
   I have assumed that the `dialect` field is common to all representations, and its value would be something like `NONE`. Given that different types could be represented by different fields/schemss in the spec, I have moved this part of the spec to the `SQL` type section.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1183139302


##########
format/view-spec.md:
##########
@@ -96,11 +96,15 @@ Summary is a string to string map of metadata about a view version. Common metad
 
 View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
+A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation(s) to use.

Review Comment:
   Let's roll this change back since it was more clear before. Engines should not use more than one representation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1185594321


##########
format/view-spec.md:
##########
@@ -96,11 +96,15 @@ Summary is a string to string map of metadata about a view version. Common metad
 
 View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
+A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation(s) to use.
+
+View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
 
 Each representation is an object with at least one common field, `type`, that is one of the following:
 * `sql`: a SQL SELECT statement that defines the view
 
+In addition to `type`, each representation defines a `dialect`. A `dialect` is a string that identifies the query language dialect used in the representation. For example, `trino` or `spark`. A view version cannot have duplicate representations with the same `type` and `dialect`.

Review Comment:
   > The dialect field is required. Did not notice something in the spec that says it was not true.
   
   Dialect is required for the `SQLViewRepresentation`. The common representation fields are defined, but currently only include `type`.
   
   > I was thinking of other languages such as Datalog, SPARQL, Cypher. Those could have dialects too.
   
   In that case, a representation that contains those might have a dialect. But IR representations like Substrait would probably not have a dialect.
   
   > I was actually under the impression that the spec allows multiple SQL representations with different dialects.
   
   We should decide. This would be fine with the current encoding. Since we don't have a reliable dialect in all cases, I don't think we would want to build rules based on it. We should just decide whether you can have multiple representations with the same type or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1190610270


##########
format/view-spec.md:
##########
@@ -98,7 +98,11 @@ View definitions can be represented in multiple ways. Representations are docume
 
 A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
 
-Each representation is an object with at least one common field, `type`, that is one of the following:
+View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
+
+A representation is identified by its `type` and `dialect`. A view version can have multiple representations (i.e., different `type` and `dialect` combinations), but at most have one representation for a given `type` and `dialect` pair.

Review Comment:
   @rdblue could you take another look? What I did:
   * Avoided giving the impression that a type must have more than one dialect. (I think it was the case already before this version, but hopefully this one makes it clearer, especially after removing the `trino`/`spark` example, which is specific to `SQL`).
   * Said that representation is uniquely identified by this pair. In the future, dialect-less types can have a dialect whose value can only be `N/A`, `None`, `""` or something of that sort. The field is required anyways, so we will have to specify one of those values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1190610270


##########
format/view-spec.md:
##########
@@ -98,7 +98,11 @@ View definitions can be represented in multiple ways. Representations are docume
 
 A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
 
-Each representation is an object with at least one common field, `type`, that is one of the following:
+View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
+
+A representation is identified by its `type` and `dialect`. A view version can have multiple representations (i.e., different `type` and `dialect` combinations), but at most have one representation for a given `type` and `dialect` pair.

Review Comment:
   @rdblue could you take another look? What I did:
   * Avoided giving the impression that a type must have more than one dialect. (I think it was the case already before this version, but hopefully this one makes it clearer, especially after removing the `trino`/`spark` example, which is specific to `SQL`).
   * Said that representation is uniquely identified by the `type` + `dialect` pair. In the future, dialect-less types can have a dialect whose value can only be `N/A`, `None`, `""` or something of that sort. The field is required anyways, so we will have to specify one of those values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7500: Views: Update spec with expectations on versions, representations, and dialects

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7500:
URL: https://github.com/apache/iceberg/pull/7500#discussion_r1186925842


##########
format/view-spec.md:
##########
@@ -96,11 +96,15 @@ Summary is a string to string map of metadata about a view version. Common metad
 
 View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.
+A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation(s) to use.
+
+View versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.
 
 Each representation is an object with at least one common field, `type`, that is one of the following:
 * `sql`: a SQL SELECT statement that defines the view
 
+In addition to `type`, each representation defines a `dialect`. A `dialect` is a string that identifies the query language dialect used in the representation. For example, `trino` or `spark`. A view version cannot have duplicate representations with the same `type` and `dialect`.

Review Comment:
   I'm fine with allowing multiple entries for a single type to allow more than one type + dialect pair.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org