You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (Jira)" <ji...@apache.org> on 2020/04/07 13:13:00 UTC

[jira] [Commented] (HIVE-23149) Consistency of Parsing Object Identifiers

    [ https://issues.apache.org/jira/browse/HIVE-23149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077222#comment-17077222 ] 

Zoltan Haindrich commented on HIVE-23149:
-----------------------------------------

bq. 2// Hive throws exception if there is a period in the table name.  This is an invalid response.  Table name may have a period in them. More likely than not, it will throw 'table not found' exception since the user most likely accidentally used backticks incorrectly and meant to specify a db and a table separately. HIVE-16907

we *do* throw an exception for a reason... if we don't internally that doesn't work correctly....fixing this would involve using TableName everywhere - and only then we can lower the safety net to allow '.' in the table names


> Consistency of Parsing Object Identifiers
> -----------------------------------------
>
>                 Key: HIVE-23149
>                 URL: https://issues.apache.org/jira/browse/HIVE-23149
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Critical
>
> There needs to be better consistency with handling of object identifiers (database, tables, column, view, function, etc.).  I think it makes sense to standardize on the same rules which MySQL/MariaDB uses for their column names so that Hive can be more of a drop-in replacement for these.
>  
> The two important things to keep in mind are:
>  
> 1// Permitted characters in quoted identifiers include the full Unicode Basic Multilingual Plane (BMP), except U+0000
>  
> 2// If any components of a multiple-part name require quoting, quote them individually rather than quoting the name as a whole. For example, write {{`my-table`.`my-column`}}, not {{`my-table.my-column`}}.  
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://dev.mysql.com/doc/refman/8.0/en/identifier-qualifiers.html]  
>  
> That is to say:
>  
> {code:sql}
> -- Select all rows from a table named `default.mytable`
> -- (Yes, the table name itself has a period in it. This is valid)
> SELECT * FROM `default.mytable`;
>  
> -- Select all rows from database `default`, table `mytable`
> SELECT * FROM `default`.`mytable`;  
> {code}
>  
> This plays out in a couple of ways.  There may be more, but these are the ones I know about already:
>  
> 1// Hive generates incorrect syntax: [HIVE-23128]
>  
> 2// Hive throws exception if there is a period in the table name.  This is an invalid response.  Table name may have a period in them. More likely than not, it will throw 'table not found' exception since the user most likely accidentally used backticks incorrectly and meant to specify a db and a table separately. [HIVE-16907]
> Once we have the parsing figured out and support for backticks to enclose UTF-8 strings, then the backend database needs to actually support the UTF-8 character set.  It currently does not: [HIVE-1808]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)