You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/01/24 17:59:00 UTC
[jira] [Commented] (IMPALA-8051) Compute stats fails on a column with comment character in name

    [ https://issues.apache.org/jira/browse/IMPALA-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751383#comment-16751383 ] 

ASF subversion and git services commented on IMPALA-8051:
---------------------------------------------------------

Commit 85a8b34645a46038fd217c03e64326b72d9669b5 in impala's branch refs/heads/master from Paul Rogers
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=85a8b34 ]

IMPALA-7905: Hive keywords not quoted for identifiers

Impala often generates SQL for statements using the toSql() call.
Generated SQL is often used during testing or when writing the query
plan. Impala keywords such as "create", when used as identifiers,
must be quoted:

SELECT `select`, `from` FROM `order` ...

The code in ToSqlUtils.getIdentSql() quotes the identifier if it is
an Impala or Hive keyword, or if it does not follow the identifier
pattern. The code uses the Hive lexer to detect a keyword. But, the
code contained a flaw: the lexer expects a case-insensitive input.
We provide a case sensitive input. As a result, "MONTH" is caught as a
Hive keyword and quoted, but "month" is not. This patch fixes that flaw.

This patch also fixes:

IMPALA-8051: Compute stats fails on a column with comment character in
name

The code uses the Hive lexical analyzer to check names. Since "#" and
"--" are comment characters, a name like "foo#" is parsed as "foo" which
does not need quotes, hence we don't quote "foo#", which causes issues.
Added a special check for "#" and "--" to resolve this issue.

Testing:

* Refactored getIdentSql() easier testing.
* Added a tests to the recently added ToSqlUtilsTest for this case and
  several others.
* Making this change caused the columns `month`, `year`, and `key` to be
  quoted when before they were not. Updated many tests as a result.
* Added a new identSql() function, for use in tests, to match the
  quoting that Impala uses, and to handle the wildcard, and multi-part
  names. Used this in ToSqlTest to handle the quoted names.
* PlannerTest emits statement SQL to the output file wrapped to 80
  columns and sometimes leaves trailing spaces at the end of the line.
  Some tools remove that trailing space, resulting in trivial file
  differences.  Fixed this to remove trailing spaces in order to simplify
  file comparisons.
* Tweaked the "In pipelines" output to avoid trailing spaces when no
  pipelines are listed.
* Reran all FE tests.

Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
Reviewed-on: http://gerrit.cloudera.org:8080/12009
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Compute stats fails on a column with comment character in name
> --------------------------------------------------------------
>
>                 Key: IMPALA-8051
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8051
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> Problem - "compute stats" query executed on a table containing a special character "#" in one of its columns is failing with below error:
> WARNINGS: AnalysisException: Syntax error in line 1:
> ...length(cola)), NDV(colb#) AS colb#, CAST(-1 as BIG...
>                              ^
> Encountered: Unexpected character
> Expected: ADD, ALTER, AND, ARRAY, AS, ASC, BETWEEN, BIGINT, BINARY, BLOCK_SIZE, BOOLEAN, CACHED, CASCADE, CHANGE, CHAR, COMMENT, COMPRESSION, CROSS, DATE, DATETIME, DECIMAL, DEFAULT, DESC, DIV, REAL, DROP, ELSE, ENCODING, END, FLOAT, FOLLOWING, FROM, FULL, GROUP, IGNORE, HAVING, ILIKE, IN, INNER, INTEGER, IREGEXP, IS, JOIN, LEFT, LIKE, LIMIT, LOCATION, MAP, NOT, NULL, NULLS, OFFSET, ON, OR, ORDER, PARTITION, PARTITIONED, PRECEDING, PRIMARY, PURGE, RANGE, RECOVER, REGEXP, RENAME, REPLACE, RESTRICT, RIGHT, RLIKE, ROW, ROWS, SELECT, SET, SMALLINT, SORT, STORED, STRAIGHT_JOIN, STRING, STRUCT, TABLESAMPLE, TBLPROPERTIES, THEN, TIMESTAMP, TINYINT, TO, UNCACHED, UNION, USING, VALUES, VARCHAR, WHEN, WHERE, WITH, COMMA, IDENTIFIER
> Steps to reproduce the issue -
> # Create a table containing special character in one of it columns from Hive. For example:
> {code:sql}
> CREATE TABLE test_special_character (`id#` int);
> {code}
> # Execute "INVALIDATE METADATA test_special_character" from Impala.
> # Execute "COMPUTE STATS test_special_character" from Impala and it'll lead to above mentioned error.
> Impala does not allow to create tables with columns containing special characters but Hive allows it by using back ticks (``) to escape it. However, Impala still can load the metadata of table and can read from column containing special character as well by escaping the special character using back ticks (``). For example, below query can be executed from Impala -
> {code:sql}
> select `id#` from test_special_character;
> {code}
> However, when "compute stats" command is executed, the query triggered by this command to compute column-level stats does not use back ticks to escape the special character present in one of the columns as it does not know that column contains a special character and this is the cause of this issue.
> (Reported by a user.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org