You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/01/24 17:59:00 UTC

[jira] [Commented] (IMPALA-7905) ToSqlUtils does not correctly quote lower-case Hive keywords

    [ https://issues.apache.org/jira/browse/IMPALA-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751382#comment-16751382 ] 

ASF subversion and git services commented on IMPALA-7905:
---------------------------------------------------------

Commit 85a8b34645a46038fd217c03e64326b72d9669b5 in impala's branch refs/heads/master from Paul Rogers
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=85a8b34 ]

IMPALA-7905: Hive keywords not quoted for identifiers

Impala often generates SQL for statements using the toSql() call.
Generated SQL is often used during testing or when writing the query
plan. Impala keywords such as "create", when used as identifiers,
must be quoted:

SELECT `select`, `from` FROM `order` ...

The code in ToSqlUtils.getIdentSql() quotes the identifier if it is
an Impala or Hive keyword, or if it does not follow the identifier
pattern. The code uses the Hive lexer to detect a keyword. But, the
code contained a flaw: the lexer expects a case-insensitive input.
We provide a case sensitive input. As a result, "MONTH" is caught as a
Hive keyword and quoted, but "month" is not. This patch fixes that flaw.

This patch also fixes:

IMPALA-8051: Compute stats fails on a column with comment character in
name

The code uses the Hive lexical analyzer to check names. Since "#" and
"--" are comment characters, a name like "foo#" is parsed as "foo" which
does not need quotes, hence we don't quote "foo#", which causes issues.
Added a special check for "#" and "--" to resolve this issue.

Testing:

* Refactored getIdentSql() easier testing.
* Added a tests to the recently added ToSqlUtilsTest for this case and
  several others.
* Making this change caused the columns `month`, `year`, and `key` to be
  quoted when before they were not. Updated many tests as a result.
* Added a new identSql() function, for use in tests, to match the
  quoting that Impala uses, and to handle the wildcard, and multi-part
  names. Used this in ToSqlTest to handle the quoted names.
* PlannerTest emits statement SQL to the output file wrapped to 80
  columns and sometimes leaves trailing spaces at the end of the line.
  Some tools remove that trailing space, resulting in trivial file
  differences.  Fixed this to remove trailing spaces in order to simplify
  file comparisons.
* Tweaked the "In pipelines" output to avoid trailing spaces when no
  pipelines are listed.
* Reran all FE tests.

Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
Reviewed-on: http://gerrit.cloudera.org:8080/12009
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> ToSqlUtils does not correctly quote lower-case Hive keywords
> ------------------------------------------------------------
>
>                 Key: IMPALA-7905
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7905
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> Not sure yet how to reproduce this error via the shell, but here is the code analysis.
> The {{ToSqlUtils}} class generates a {{CREATE TABLE}} statement, which uses a method {{getIdentSql()}} to possibly quote a table or column name. This same method is used in multiple places in the {{toSql()}} logic for various statements.
> The comment for the method says:
> bq. returns an identifier lexable by Impala and Hive, possibly by enclosing the original identifier in "`" quotes.
> To check for a Hive-compatible identifier, the code uses the Hive lexer:
> {code:java}
>     HiveLexer hiveLexer = new HiveLexer(new ANTLRStringStream(ident));
> {code}
> A unit test shows that this logic fails to catch lower case keywords: "select", say, while it does catch upper-case keywords: "SELECT".
> Checking the Hive source, it appears we're using the lexer wrong:
> {code:java}
>     HiveLexerX lexer = new HiveLexerX(new ANTLRNoCaseStringStream(command));
> {code}
> The fix is simple: upper-case the symbol before using he Hive lexer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org