You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2020/04/09 20:50:00 UTC

[jira] [Assigned] (HIVE-23172) Quoted Backtick Columns Are Not Parsing Correctly

     [ https://issues.apache.org/jira/browse/HIVE-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Mollitor reassigned HIVE-23172:
-------------------------------------


> Quoted Backtick Columns Are Not Parsing Correctly
> -------------------------------------------------
>
>                 Key: HIVE-23172
>                 URL: https://issues.apache.org/jira/browse/HIVE-23172
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Critical
>
> I recently came across a weird behavior while examining failures of {{special_character_in_tabnames_2.q}} while working on HIVE-23150. I was surprised to see it fail because I couldn't see of any reason why it should... it's doing pretty standard SQL statements just like every other test, but for some reason this test is just a *little bit* differently than most others and it brought this issue to light.
> Turns out,... the parsing of table names is pretty much wrong across the board.
> The statement that caught my attention was this:
> {code:sql}
> DROP TABLE IF EXISTS `s/c`;
> {code}
> And here is the relevant grammar:
> {code:none}
> fragment
> RegexComponent
>     : 'a'..'z' | 'A'..'Z' | '0'..'9' | '_'
>     | PLUS | STAR | QUESTION | MINUS | DOT
>     | LPAREN | RPAREN | LSQUARE | RSQUARE | LCURLY | RCURLY
>     | BITWISEXOR | BITWISEOR | DOLLAR | '!'
>     ;
> Identifier
>     :
>     (Letter | Digit) (Letter | Digit | '_')*
>     | {allowQuotedId()}? QuotedIdentifier  /* though at the language level we allow all Identifiers to be QuotedIdentifiers; 
>                                               at the API level only columns are allowed to be of this form */
>     | '`' RegexComponent+ '`'
>     ;
> fragment    
> QuotedIdentifier 
>     :
>     '`'  ( '``' | ~('`') )* '`' { setText(StringUtils.replace(getText().substring(1, getText().length() -1 ), "``", "`")); }
>     ;
> {code}
> The mystery for me was that, for some reason, this String {{`s/c`}} was being stripped of its back-ticks. Every other test I investigated did not have this behavior, the back ticks were always preserved around the table name. The main Hive Java code base would see the back-ticks and deal with it internally. For HIVE-23150, I introduced some sanity checks and they were failing because they were expecting the back ticks to be present.
> With the help of HIVE-23171 I finally figured it out. So, what I discovered is that pretty much every table name is hitting the {{RegexComponent}} rule and the back ticks are carried forward. However, {{`s/c`}} the forward slash `/` is not allowable in {{RegexComponent}} so it hits on {{QuotedIdentifier}} rule which is trimming the back ticks.
> I validated this by disabling {{QuotedIdentifier}}. When I did this, {{`s/c`}} fails in error but {{`sc`}} parses successfully... because {{`sc`}} is being treated as a {{RegexComponent}}.
> So, if you have {{allowQuotedId}} disabled, table names can only use the characters defined in the {{RegexComponent}} rule (otherwise it errors), and it will *not* strip the back ticks. If you have {{allowQuotedId}} enabled, then if the table name has a character not specified in {{RegexComponent}}, it will identify it as a table name and it *will* strip the back ticks, if all the characters are part of {{RegexComponent}} then it will *not* strip the back ticks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)