You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2017/10/20 21:24:02 UTC

[jira] [Created] (IMPALA-6096) RLIKE/REGEXP operator and regexp_* functions require different character escaping

Harsh J created IMPALA-6096:
-------------------------------

             Summary: RLIKE/REGEXP operator and regexp_* functions require different character escaping
                 Key: IMPALA-6096
                 URL: https://issues.apache.org/jira/browse/IMPALA-6096
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 2.5.0
            Reporter: Harsh J
            Priority: Minor


When escaping a regex-special character such as {{(}} or {{[}}, the REGEXP/RLIKE operator requires a triple escape such as {noformat}\\\({noformat} while the regexp_like/regexp_extract/etc. functions only require a double escape such as {noformat}\\({noformat}

Here's a test proving the difference in requirement:

{code}
CREATE TABLE test_regexp (a STRING);
INSERT INTO test_regexp VALUES ('This is a (string) with [special chars] that need e.s.c.a.p.i.n.g');

-- The below will fail
SELECT regexp_extract(a, 'This is a \\((.*)\\) with .*', 1) FROM test_regexp WHERE a RLIKE 'This is a \\(string.*';

-- The below will pass correctly
SELECT regexp_extract(a, 'This is a \\((.*)\\) with .*', 1) FROM test_regexp WHERE a RLIKE 'This is a \\\(string.*';
{code}

The failure message is: {noformat}invalid regular expression in 'a RLIKE 'This is a \\(string.*''{noformat}

Could the escape format be unified between the two? Per the documentation, both are supposed to be following the standard RE2 syntax.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)