You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/16 03:05:42 UTC

[GitHub] [iceberg] jackye1995 opened a new issue #2833: Unified syntax for system table names

jackye1995 opened a new issue #2833:
URL: https://github.com/apache/iceberg/issues/2833


   Currently Spark and Flink uses syntax `db.table.system_table_name` to access system tables, whereas Trino uses `db.table$system_table_name`. I also saw on dev list that Peter is planning to add system table support to Hive. I am also scoping the snapshot tagging feature which will add more complexity to the table naming scheme. So I think it's a good time to discuss what is the best syntax going forward.
   
   I remember in #1144 that we realized there is an issue for Spark to use dot as the delimiter for default catalog, and it was never truly fixed. I saw Ryan had the suggestion for using `__` instead. I would like to know what is everyone's take on this, so that we can provide a more unified experience for all users.
   
   @rdblue @electrum @RussellSpitzer @openinx @pvary 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882721921


   @pvary, in Spark the rules are:
   
   1. If the name has one identifier part, then use the current catalog and namespace with identifier as the table name
   2. If the multi-part identifier does not start with a catalog name, use the current catalog with the identifier's namespace and table name
   3. If the multi-part identifier starts with a catalog name, it is a full identifier. Use the catalog, namespace, and table name from the identifier
   
   Those never produce ambiguity. The trade-off is that if you use a table name like `customers.history` (where `customers` is a table) then Spark will not fill in the current database/schema name for the namespace. Spark would be able to find `customers` and resolve it to `current_catalog.current_namespace.customers` though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882732111


   @pvary Spark looks for the first valid references to catalog or database before loading the table, we actually copied the logic here
   
   https://github.com/apache/iceberg/blob/01393a06c284175edab75de34f48b2bfbd606081/spark/src/main/java/org/apache/iceberg/spark/SparkUtil.java#L80-L79
   
   But you end up checking 
   ```
   1. "Does the name have a single part?"
     a. "Load defaultCatalog - defaultDatabase - name"
   2. Can I treat the first part of the name as a catalog?
     a. Use the catalog, use the last element of the name as the table name, everything else is database
     b. "Load firstPart as Catalog, middlePart as Database, lastPart as TableName" - If database is empty, use default database
   3. Use the default Catalog
     a. Use last element as table name, everything else as database
     b. if database is empty use default database
   ```
   
   So say we have no table "history"
   
   "customer.history" refers to the metadata table
   
   If you have a database customer and table history and iceberg table customer
   
   "customer.customer.history" refers to the metadata table (customer.history is the other table)
   
   If you have a catalog customer and database customer and table history and table customer
   
   "customer.customer.customer.history" refers to the metadata table "history" 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
findepi commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-883148258


   As long as table name is delimited (`"customer.history"`) it's unambiguously an identifier.
   However, when written as `customer.history`, it is a qualified name, and this probably has far reaching implications with the SQL spec, which governs how names are resolved.
   
   @martint do you think it's possible to use dot-separator for system tables and also obey SQL specification?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] losipiuk commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
losipiuk commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-881375493


   cc: @findepi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882707266






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-883057433


   Thanks @rdblue and @RussellSpitzer for the detailed answer. Let's translate it for Hive where we do not have Catalogs yet for the queries:
   - If we have a single part identifier use the default database and return the data table
   - If we have multipart identifier, then expect it to be a full identifier (no defaults here)
   
   This results the same algorithm that we come up with @marton-bod and mentioned in my first comment:
   > you have to always provide the db if you want to access the metadata tables
   
   But at least this does not seem so lame anymore šŸ˜„
   
   Also it could be simplified to:
   - If we have a 3 part identifier use the last part as a metadata table type, and use the rest as a table identifier
   - If the identifier has 2 or fewer parts do not change the behaviour
   
   Seems like a manageable change to me. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
findepi commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-883148258


   As long as table name is delimited (`"customer.history"`) it's unambiguously an identifier.
   However, when written as `customer.history`, it is a qualified name, and this probably has far reaching implications with the SQL spec, which governs how names are resolved.
   
   @martint do you think it's possible to use dot-separator for system tables and also obey SQL specification?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-883039932


   > The easy advice for users is ā€œdonā€™t create tables with dots in the nameā€ which is that something that IIRC the Hive metastore doesnā€™t allow
   
   AFAIK Hive nowadays allows to create table names with dots, but you should backquote them. I am not sure about the released versions, and as a user I would be cautious about depending on it, but theoretical we can handle them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882732111


   @pvary Spark looks for the first valid references to catalog or database before loading the table, we actually copied the logic here
   
   https://github.com/apache/iceberg/blob/01393a06c284175edab75de34f48b2bfbd606081/spark/src/main/java/org/apache/iceberg/spark/SparkUtil.java#L80-L79
   
   But you end up checking 
   ```
   1. "Does the name have a single part?"
     a. "Load defaultCatalog - defaultDatabase - name"
   2. Can I treat the first part of the name as a catalog?
     a. Use the catalog, use the last element of the name as the table name, everything else is database
     b. "Load firstPart as Catalog, middlePart as Database, lastPart as TableName" - If database is empty, use default database
   3. Use the default Catalog
     a. Use last element as table name, everything else as database
     b. if database is empty use default database
   ```
   
   So say we have no table "history"
   
   "customer.history" refers to the metadata table
   
   If you have a database customer and table history and iceberg table customer
   
   "customer.customer.history" refers to the metadata table (customer.history is the other table)
   
   If you have a catalog customer and database customer and table history and table customer
   
   "customer.customer.customer.history" refers to the metadata table "history" 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] electrum edited a comment on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
electrum edited a comment on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882998681


   Do any Iceberg catalog implementations allow users to create a table name with a dot? If we have a `customer` table, then `customer.history` will go to the history table. If a user then creates a `customer.history` table, how would you reference the history table for `customer`? Would existing queries against the history table be broken?
   
   (ignore namespaces and assume this is in the same namespace / schema)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer edited a comment on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
RussellSpitzer edited a comment on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-881154751


   The Spark issue is for 2.4 (and 3.0) I think we got the fix into 3.1 (thanks @holdenk!) . One fix I was considering was just allowing database.\`table.metadatatype\` and database.\`table$metadatatype\` as spark identifiers. The patch for Iceberg is pretty small and it really only stops us from using $ and . inside of table names. That way we can have similar functionality in 2.4 but with a tiny caveat


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
findepi commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-883148258


   As long as table name is delimited (`"customer.history"`) it's unambiguously an identifier.
   However, when written as `customer.history`, it is a qualified name, and this probably has far reaching implications with the SQL spec, which governs how names are resolved.
   
   @martint do you think it's possible to use dot-separator for system tables and also obey SQL specification?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882721921


   @pvary, in Spark the rules are:
   
   1. If the name has one identifier part, then use the current catalog and namespace with identifier as the table name
   2. If the multi-part identifier does not start with a catalog name, use the current catalog with the identifier's namespace and table name
   3. If the multi-part identifier starts with a catalog name, it is a full identifier. Use the catalog, namespace, and table name from the identifier
   
   Those never produce ambiguity. The trade-off is that if you use a table name like `customers.history` (where `customers` is a table) then Spark will not fill in the current database/schema name for the namespace. Spark would be able to find `customers` and resolve it to `current_catalog.current_namespace.customers` though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-881170808


   @RussellSpitzer thanks for the info, glad it's fixed in 3.1! I am okay with 2.4 having a caveat as it is an older version. Let's see how others think about it.
   
   I am not trying to suggest supporting both dot and dollar sign, ideally we only support one which is clean and simple for both developer and user. 
   
   I don't know how open source Trino thinks about changing dollar to a dot. We can have a grace period in Trino to support both and then drop the support if that can be a way going forward.
   
    @electrum @martint @phd3 @losipiuk


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
findepi commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-881488774


   For context, Trino use of `$` was chosen because it's believed to be even less likely to collide with real existing tables than e.g. underscores. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] electrum edited a comment on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
electrum edited a comment on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882998681


   Do any Iceberg catalog implementations allow users to create a table name with a dot? If we have a `customer` table, then `customer.history` will go to the history table. If a user then creates a `customer.history` table, how would you reference the history table for `customer`? Would existing queries against the history table be broken?
   
   (ignore namespaces and assume this is in the same namespace / schema)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882703014


   @findepi thanks for the context! I agree that using the dollar sign is safer comparing to using the dot sign. However, given the fact that Spark and Flink and any new engines that implement the core are going to inherit the dot sign for system table, we need to think if we should change it in Trino, or to make a backwards incompatible change in core.
   
   There are 3 ways I see we can go:
   1. everyone uses dot going forward, Trino tries to migrate
   2. everyone uses dollar sign going forward, Iceberg tries to migrate. As we are releasing v2, it might be a good time to do it as a new major version and not maintaining backwards compatibility.
   3. continue to use dot in core and dollar sign in Trino, which I hope we do not go with this approach as it's quite confusing for end users based on the feedback we received.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882721921


   @pvary, in Spark the rules are:
   
   1. If the name has one identifier part, then use the current catalog and namespace with identifier as the table name
   2. If the multi-part identifier does not start with a catalog name, use the current catalog with the identifier's namespace and table name
   3. If the multi-part identifier starts with a catalog name, it is a full identifier. Use the catalog, namespace, and table name from the identifier
   
   Those never produce ambiguity. The trade-off is that if you use a table name like `customers.history` (where `customers` is a table) then Spark will not fill in the current database/schema name for the namespace. Spark would be able to find `customers` and resolve it to `current_catalog.current_namespace.customers` though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-883010680


   @electrum this is allowed.
   
   When `customer.history` and `customer` coexists, I believe that based on the current table loading logic, the history table of `customer` table will always be loaded instead of `customer.history`:
   
   https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L95-L96
   
   To access `customer.history` table, it needs to use backquote for the table name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-881154751


   The Spark issue is for 2.4 (and 3.0) I think we got the fix into 3.1 (thanks @holdenk!) . One fix I was considering was just allowing database.\`table.metadatatype\` and database.\`table$metadatatype\` as spark identifiers. I the patch for Iceberg is pretty small and it really only stops us from using $ and . inside of table names. That way we can have similar functionality in 2.4 but with a tiny caveat.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] electrum commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
electrum commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882998681






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] electrum edited a comment on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
electrum edited a comment on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882998681


   Do any Iceberg catalog implementations allow users to create a table name with a dot? If we have a `customer` table, then `customer.history` will go to the history table. If a user then creates a `customer.history` table, how would you reference the history table for `customer`? Would existing queries against the history table be broken?
   
   (ignore namespaces and assume this is in the same namespace / schema)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] electrum commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
electrum commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882998681






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary edited a comment on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
pvary edited a comment on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882707266


   How does Spark solve the ambiguity when `customer.history` is the source of the query? Does spark disallow creating tables named `history`? What happens when a `history` table is created in the `customer` database in the Hive Metastore and the user wants to query that table?
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary edited a comment on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
pvary edited a comment on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882707266


   How does Spark solve the ambiguity when `customer.history` is the source of the query? Does spark disallow creating tables named `history`? What happens when a `history` table is created in the `customer` database in the Hive Metastore and the user wants to query that table?
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882707266


   How does Spark solve the ambiguity when `customer.history` is the source of the query? Does spark disallow creating tables named `history`? What happens when a `history` table is created in the Hive Metastore and the user wants to query that table?
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-881315289


   We had a related discussion with @marton-bod this morning.
   
   The problem we found is that if we have a query `USE default; SELECT * from customers.history;` in Hive it could be either db=`customers` table=`history`, or after introducing metadata tables it could be db=`default` table=`customers` metaType=`history`. We could only come up with the lame solution of "you have to always provide the db if you want to access the metadata tables, like `SELECT * from default.customers.history`"
   
   Choosing another delimiter would fix this issue.
   
   I have also seen `__` as a delimiter in this [PR](https://github.com/apache/iceberg/pull/1103/files#diff-572bca25ff182a00d652047c2a5299f6fd61e3af05f53d32a18188741ca58b60R54-R54) by @cmathiesen
   ```
     public static final String SNAPSHOT_TABLE = "iceberg.snapshots.table";
     public static final String SNAPSHOT_TABLE_SUFFIX = "__snapshots";
   ```
   The other part of the code never made it into upstream so these configs were eventually removed.
   
   It would be really good to be able to access the metadata tables in an unified way.
   
   Maybe @boroknagyz is also interested in this from Impala side.
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary edited a comment on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
pvary edited a comment on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882707266


   How does Spark solve the ambiguity when `customer.history` is the source of the query? Does spark disallow creating tables named `history`? What happens when a `history` table is created in the `customer` database in the Hive Metastore and the user wants to query that table?
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882732111


   @pvary Spark looks for the first valid references to catalog or database before loading the table, we actually copied the logic here
   
   https://github.com/apache/iceberg/blob/01393a06c284175edab75de34f48b2bfbd606081/spark/src/main/java/org/apache/iceberg/spark/SparkUtil.java#L80-L79
   
   But you end up checking 
   ```
   1. "Does the name have a single part?"
     a. "Load defaultCatalog - defaultDatabase - name"
   2. Can I treat the first part of the name as a catalog?
     a. Use the catalog, use the last element of the name as the table name, everything else is database
     b. "Load firstPart as Catalog, middlePart as Database, lastPart as TableName" - If database is empty, use default database
   3. Use the default Catalog
     a. Use last element as table name, everything else as database
     b. if database is empty use default database
   ```
   
   So say we have no table "history"
   
   "customer.history" refers to the metadata table
   
   If you have a database customer and table history and iceberg table customer
   
   "customer.customer.history" refers to the metadata table (customer.history is the other table)
   
   If you have a catalog customer and database customer and table history and table customer
   
   "customer.customer.customer.history" refers to the metadata table "history" 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882703014






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] electrum commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
electrum commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882998681


   Do any Iceberg catalog implementations allow users to create a table name with a dot? If we have a `customer` table, then `customer.history` will go to the history table. If a user then creates a `customer.history` table, how would you reference the history table for `customer`? Would existing queries against the history table be broken?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882707266






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] electrum commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
electrum commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-883026038


   Is the backquote handling part of the Iceberg library? Can you create a table name containing backquotes? Are they then escaped by adding another level of backquotes?
   
   This type of scheme seems reasonable. Weā€™d want to clearly document the rules. The easy advice for users is ā€œdonā€™t create tables with dots in the nameā€ which is that something that IIRC the Hive metastore doesnā€™t allow, and would require quoting anyway, so it seems unlikely that people will do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2833: Unified syntax for system table names

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2833:
URL: https://github.com/apache/iceberg/issues/2833#issuecomment-882703014






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org