You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/07 04:31:03 UTC

[GitHub] [iceberg] jzhuge opened a new pull request, #6134: Spec: Add query-column-names to SQL view representation in view spec

jzhuge opened a new pull request, #6134:
URL: https://github.com/apache/iceberg/pull/6134

   Current view spec misses the field `query-column-names` in SQL view representation.
   
   For SELECT star view queries, the schema for the underlying table or view may change after the view has been created.
   Thus, we need to store the column names of the view query, because when using the view, it is better to pick the columns
   according to the name and order when the view was created and omit the extra columns we don't require.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #6134: Spec: Add query-column-names to SQL view representation in view spec

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #6134: Spec: Add query-column-names to SQL view representation in view spec
URL: https://github.com/apache/iceberg/pull/6134


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dimas-b commented on a diff in pull request #6134: Spec: Add query-column-names to SQL view representation in view spec

Posted by GitBox <gi...@apache.org>.
dimas-b commented on code in PR #6134:
URL: https://github.com/apache/iceberg/pull/6134#discussion_r1015497935


##########
format/view-spec.md:
##########
@@ -116,11 +116,19 @@ This type of representation stores the original view definition in SQL and its S
 | Optional | schema-id | ID of the view's schema when the version was created |
 | Optional | default-catalog | A string specifying the catalog to use when the table or view references in the view definition do not contain an explicit catalog. |
 | Optional | default-namespace | The namespace to use when the table or view references in the view definition do not contain an explicit namespace. Since the namespace may contain multiple parts, it is serialized as a list of strings. |

Review Comment:
   nit: `references` -> `referenced` (`d` at the end) ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #6134: Spec: Add query-column-names to SQL view representation in view spec

Posted by GitBox <gi...@apache.org>.
wmoustafa commented on code in PR #6134:
URL: https://github.com/apache/iceberg/pull/6134#discussion_r1015129057


##########
format/view-spec.md:
##########
@@ -116,11 +116,19 @@ This type of representation stores the original view definition in SQL and its S
 | Optional | schema-id | ID of the view's schema when the version was created |
 | Optional | default-catalog | A string specifying the catalog to use when the table or view references in the view definition do not contain an explicit catalog. |
 | Optional | default-namespace | The namespace to use when the table or view references in the view definition do not contain an explicit namespace. Since the namespace may contain multiple parts, it is serialized as a list of strings. |
+| Optional | query-column-names | The output column names of the query when the view was created. The field aliases are not applied. The list should have the same length as the schema's top level fields. See the example below. |
 | Optional | field-aliases | A list of strings of field aliases optionally specified in the create view statement. The list should have the same length as the schema's top level fields. See the example below. |
 | Optional | field-docs | A list of strings of field comments optionally specified in the create view statement. The list should have the same length as the schema's top level fields. See the example below. |
 
 For `CREATE VIEW v (alias_name COMMENT 'docs', alias_name2, ...) AS SELECT col1, col2, ...`,
-the field aliases are 'alias_name', 'alias_name2', and etc., and the field docs are 'docs', null, and etc.
+the field aliases are 'alias_name', 'alias_name2', etc., and the field docs are 'docs', null, etc.
+
+The view schema should have the field aliases applied.
+
+For SELECT star view queries, the schema for the underlying table or view may change after the view has been created.
+Thus, we need to store the column names of the view query, because when using the view, we need to pick the columns
+according to the name and order when the view was created and omit the extra columns we don't require.
+

Review Comment:
   Is not there a chance for them to become ambiguous after evolution? Most query engine just fail the query if the underlying tables evolve in a way that changes the number of columns a `*` expands to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #6134: Spec: Add query-column-names to SQL view representation in view spec

Posted by GitBox <gi...@apache.org>.
wmoustafa commented on code in PR #6134:
URL: https://github.com/apache/iceberg/pull/6134#discussion_r1015881111


##########
format/view-spec.md:
##########
@@ -116,11 +116,19 @@ This type of representation stores the original view definition in SQL and its S
 | Optional | schema-id | ID of the view's schema when the version was created |
 | Optional | default-catalog | A string specifying the catalog to use when the table or view references in the view definition do not contain an explicit catalog. |
 | Optional | default-namespace | The namespace to use when the table or view references in the view definition do not contain an explicit namespace. Since the namespace may contain multiple parts, it is serialized as a list of strings. |
+| Optional | query-column-names | The output column names of the query when the view was created. The field aliases are not applied. The list should have the same length as the schema's top level fields. See the example below. |
 | Optional | field-aliases | A list of strings of field aliases optionally specified in the create view statement. The list should have the same length as the schema's top level fields. See the example below. |
 | Optional | field-docs | A list of strings of field comments optionally specified in the create view statement. The list should have the same length as the schema's top level fields. See the example below. |
 
 For `CREATE VIEW v (alias_name COMMENT 'docs', alias_name2, ...) AS SELECT col1, col2, ...`,
-the field aliases are 'alias_name', 'alias_name2', and etc., and the field docs are 'docs', null, and etc.
+the field aliases are 'alias_name', 'alias_name2', etc., and the field docs are 'docs', null, etc.
+
+The view schema should have the field aliases applied.
+
+For SELECT star view queries, the schema for the underlying table or view may change after the view has been created.
+Thus, we need to store the column names of the view query, because when using the view, we need to pick the columns
+according to the name and order when the view was created and omit the extra columns we don't require.
+

Review Comment:
   So if you create a `SELECT *` view on tables `t1` with columns `(a, b) joined with table `t2` with columns `(x)`, then add column `a` to `t2`, the query text will be ambiguous, and in fact I tried it on Spark and it threw an exception when querying the view: 
   ```The SQL query of view `db1`.`v` has an incompatible schema change and column a cannot be resolved. Expected 1 columns named a but got [a,a]```
   
   The above sounds like an implementation detail of Spark, and probably should not be exposed by Iceberg. One solution to work around it is to record table names as well, not just view names. Another is to expect to expand the query text (i.e., expand the `*`) at view creation time, like what Hive does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #6134: Spec: Add query-column-names to SQL view representation in view spec

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6134:
URL: https://github.com/apache/iceberg/pull/6134#issuecomment-1328356142

   I don't think that query column names are the right way to go. I think that the schema referenced by ID should be the pre-alias schema. That supports getting the pre-alias column names as well as more validation on data types and nested fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org