You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Jin Chul Kim <ji...@gmail.com> on 2017/11/21 14:51:09 UTC

[IMPALA-3942] A question on frontend

Hi,

https://issues.apache.org/jira/browse/IMPALA-3942

Do we need to distinghish between single quoted string literal and double
quoted string literal when the frontend builds an AST?

I was worried about the solution to this problem in IMPALA-3942 and this
question came up.

1. Symptom
create table t1 (original string);
insert into t1 values('That\\\'s it!');
create view v1 as select regexp_replace(original, "\\\\'","'") as replaced,
* from t1;
select * from v1; -- parse error internally

2. Cause
I think the root cause is the conversion of double quoted string
literal(i.e. "\\\\'" and "'") to single quoted string literal while
generating a query string using AST. The re-generated select query should
have syntatical error. Please see the result of "show create table v1".
"create view v2" query works on Hive because Hive keeps double quoted
string literal, so it works fine as below.

hive> show create table v1;
CREATE VIEW `v1` AS SELECT regexp_replace(original, '\\\\'', ''') replaced,
* FROM jc.t1
hive> show create table v2;
CREATE VIEW `v2` AS select regexp_replace(`t1`.`original`, "\\\\'","'") as
`replaced`, `t1`.`original` from `jc`.`t1`

3. (Possible) Solution
I am not sure the approache makes any side effect. Do you think this
approach is valid?

My initial idea is to keep distinguishable information for single/double
quote string literals. StringLiteral class can have a bolean flag either
single or double quote. When toSql* is invoked, quote style is determined
by the flag. Currently our lexical analyzer just keeps the string literal
only. In sql-scanner.flex,
SingleQuoteStringLiteral = \'(\\.|[^\\\'])*\'
DoubleQuoteStringLiteral = \"(\\.|[^\\\"])*\"

{SingleQuoteStringLiteral} {
  return newToken(SqlParserSymbols.STRING_LITERAL, yytext().substring(1,
yytext().length()-1));
}

{DoubleQuoteStringLiteral} {
  return newToken(SqlParserSymbols.STRING_LITERAL, yytext().substring(1,
yytext().length()-1));
}

4. Further question
Most of RDBMSes supports only single quoted string literal(not double
quote). By the way, Hive supports the both and it makes some problems such
as migration issue, different behavior and so on. Why does Impala support
this feature also? Just for more compatibility with Hive? Or other reason?

I found an article "Hive: Allows Single and Double Quotes Interchangeably".
The author said "do not use double quote". What do you think about that?
http://www.thedatastudio.net/hive_flexible_quotes.htm

Best regards,
Jinchul

Re: [IMPALA-3942] A question on frontend

Posted by Jim Apple <jb...@cloudera.com>.
I responded to  #3 on the JIRA.

For #4, I'd guess that the answer is compatibility with Hive, which I
think we should keep in this case so as not to break Impala users's
workflows.