You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2020/04/09 20:50:00 UTC
[jira] [Assigned] (HIVE-23172) Quoted Backtick Columns Are Not
Parsing Correctly
[ https://issues.apache.org/jira/browse/HIVE-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Mollitor reassigned HIVE-23172:
-------------------------------------
> Quoted Backtick Columns Are Not Parsing Correctly
> -------------------------------------------------
>
> Key: HIVE-23172
> URL: https://issues.apache.org/jira/browse/HIVE-23172
> Project: Hive
> Issue Type: Improvement
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Critical
>
> I recently came across a weird behavior while examining failures of {{special_character_in_tabnames_2.q}} while working on HIVE-23150. I was surprised to see it fail because I couldn't see of any reason why it should... it's doing pretty standard SQL statements just like every other test, but for some reason this test is just a *little bit* differently than most others and it brought this issue to light.
> Turns out,... the parsing of table names is pretty much wrong across the board.
> The statement that caught my attention was this:
> {code:sql}
> DROP TABLE IF EXISTS `s/c`;
> {code}
> And here is the relevant grammar:
> {code:none}
> fragment
> RegexComponent
> : 'a'..'z' | 'A'..'Z' | '0'..'9' | '_'
> | PLUS | STAR | QUESTION | MINUS | DOT
> | LPAREN | RPAREN | LSQUARE | RSQUARE | LCURLY | RCURLY
> | BITWISEXOR | BITWISEOR | DOLLAR | '!'
> ;
> Identifier
> :
> (Letter | Digit) (Letter | Digit | '_')*
> | {allowQuotedId()}? QuotedIdentifier /* though at the language level we allow all Identifiers to be QuotedIdentifiers;
> at the API level only columns are allowed to be of this form */
> | '`' RegexComponent+ '`'
> ;
> fragment
> QuotedIdentifier
> :
> '`' ( '``' | ~('`') )* '`' { setText(StringUtils.replace(getText().substring(1, getText().length() -1 ), "``", "`")); }
> ;
> {code}
> The mystery for me was that, for some reason, this String {{`s/c`}} was being stripped of its back-ticks. Every other test I investigated did not have this behavior, the back ticks were always preserved around the table name. The main Hive Java code base would see the back-ticks and deal with it internally. For HIVE-23150, I introduced some sanity checks and they were failing because they were expecting the back ticks to be present.
> With the help of HIVE-23171 I finally figured it out. So, what I discovered is that pretty much every table name is hitting the {{RegexComponent}} rule and the back ticks are carried forward. However, {{`s/c`}} the forward slash `/` is not allowable in {{RegexComponent}} so it hits on {{QuotedIdentifier}} rule which is trimming the back ticks.
> I validated this by disabling {{QuotedIdentifier}}. When I did this, {{`s/c`}} fails in error but {{`sc`}} parses successfully... because {{`sc`}} is being treated as a {{RegexComponent}}.
> So, if you have {{allowQuotedId}} disabled, table names can only use the characters defined in the {{RegexComponent}} rule (otherwise it errors), and it will *not* strip the back ticks. If you have {{allowQuotedId}} enabled, then if the table name has a character not specified in {{RegexComponent}}, it will identify it as a table name and it *will* strip the back ticks, if all the characters are part of {{RegexComponent}} then it will *not* strip the back ticks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)