You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "LucasRoesler (via GitHub)" <gi...@apache.org> on 2023/05/11 08:07:51 UTC

[GitHub] [iceberg] LucasRoesler opened a new issue, #7584: AWS: provide option to hid old fields in Glue table

LucasRoesler opened a new issue, #7584:
URL: https://github.com/apache/iceberg/issues/7584

   ### Feature Request / Improvement
   
   In https://github.com/apache/iceberg/pull/3888 the Glue schema generation was adjusted so that all old fields are included in the schema. The original reasoning was 
   
   > so that people know what were the columns that were already used in the past and avoid adding the same name column.
   
   In my organization, there are _many_ users of these tables via Athena who are not data engineers that own the schema. They no idea about the old schema, they are not editing the schema, and their default use case is querying the current data.  They report it as confusing that the schema shows a field that does not exist and produces errors if they attempt to use it. 
   
   Neither Athena nor Glue seem to have any support to display these old fields as non-active or deprecated or to hide these fields. Therefore, it would be nice to have a configuration option to disable including non-current fields in the schema. 
   
   ### Query engine
   
   Athena


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dertodestod commented on issue #7584: AWS: provide option to hid old fields in Glue table

Posted by "dertodestod (via GitHub)" <gi...@apache.org>.
dertodestod commented on issue #7584:
URL: https://github.com/apache/iceberg/issues/7584#issuecomment-1549688695

   I also don't quite understand the current behavior in Athena/Glue when a column is dropped. I can see that a new schema is created in the metadata file without the column and in Glue the column moves to the end of the table and gets a "_iceberg.field.current": "false"_ setting. However, the column still shows up for consumers in Athena web console (but not when doing a DESCRIBE of the table) so this has led to some confusion in our business. 
   
   I couldn't check if the column appears via JDBC (because of some errors) but I guess the column won't be listed because I see in Athena that a DESCRIBE query is used to retrieve that information. Can someone confirm that? 
   
   I personally think that Athena should not show the deleted columns (neither in the web nor via JDBC). Is there perhaps a way to keep track of the dropped column(s) without showing them in Athena? If not, it would be great if one could be created.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] AWS: provide option to hide old fields in Glue table [iceberg]

Posted by "tcassou (via GitHub)" <gi...@apache.org>.
tcassou commented on issue #7584:
URL: https://github.com/apache/iceberg/issues/7584#issuecomment-1872542758

   Hello! Our organization is facing the same problem. In particular, the Glue API will return columns that cannot be resolved in the source data, causing queries to fail. We've been using Presto views crested dynamically, and breaking every time a column is dropped.
   
   Technically, schema versioning is meant to solve this challenge:
   
   > so that people know what were the columns that were already used in the past and avoid adding the same name column.
   
   The latest schema of a table should be aligned with the data, and previous versions will keep track of historical modifications.
   Could we think of publishing new schema versions in Glue instead of this workaround that introduced bugs/defects?
   Or at the very least making this newly introduced behavior optional?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wojciechjak commented on issue #7584: AWS: provide option to hide old fields in Glue table

Posted by "wojciechjak (via GitHub)" <gi...@apache.org>.
wojciechjak commented on issue #7584:
URL: https://github.com/apache/iceberg/issues/7584#issuecomment-1580479891

   Also the same issue when renaming columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pdehaansbp commented on issue #7584: AWS: provide option to hide old fields in Glue table

Posted by "pdehaansbp (via GitHub)" <gi...@apache.org>.
pdehaansbp commented on issue #7584:
URL: https://github.com/apache/iceberg/issues/7584#issuecomment-1653680725

   Same issue. Curious to read what @jackye1995 and @yyanyy think about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org