You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/11/01 21:39:01 UTC

[jira] [Commented] (IMPALA-7778) RCFile parser ignores escape characters

    [ https://issues.apache.org/jira/browse/IMPALA-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672262#comment-16672262 ] 

ASF subversion and git services commented on IMPALA-7778:
---------------------------------------------------------

Commit 95b56d0e2d8232d8707603c360b98a35bb80ff3a in impala's branch refs/heads/master from [~tarmstrong@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=95b56d0 ]

IMPALA-7586: fix predicate pushdown of escaped strings

This fixes a class of bugs where the planner incorrectly uses the raw
string from the parser instead of the unescaped string. This occurs in
several places that push predicates down to the storage layer:
* Kudu scans
* HBase scans
* Data source scans

There are some more complex issues with escapes and the LIKE predicate
that are tracked separately by IMPALA-2422.

This also uncovered a different issue with RCFiles that is tracked by
IMPALA-7778 and is worked around by the tests added.

In order to make bugs like this more obvious in future, I renamed
getValue() to getValueWithOriginalEscapes().

Testing:
Added regression test that tests handling of backslash escapes on all
file formats. I did not add a regression test for the data source bug
since it seems to require some major modification of the data source
test infrastructure.

Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Reviewed-on: http://gerrit.cloudera.org:8080/11814
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> RCFile parser ignores escape characters
> ---------------------------------------
>
>                 Key: IMPALA-7778
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7778
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 3.1.0
>            Reporter: Tim Armstrong
>            Priority: Major
>              Labels: correctness
>
> If an RCFile table has an escape character specified then it is ignored by Impala.
> {code}
> -- HIVE
> CREATE TABLE rc_escape ( s string)
> ROW FORMAT delimited fields terminated by ','  escaped by '\\'
> STORED AS RCFILE;
> insert into rc_escape select '\\"';
> select length(s), s from rc_escape;
> -- +-----+-----+
> -- | c0  |  s  |
> -- +-----+-----+
> -- | 2   | \"  |
> -- +-----+-----+
> -- IMPALA
> invalidate metadata rc_escape;
> select length(s), s from rc_escape;
> -- +-----------+-----+
> -- | length(s) | s   |
> -- +-----------+-----+
> -- | 3         | \\" |
> -- +-----------+-----+
> {code}
> I reproduced on my dev env with "beeline -n $USER -u jdbc:hive2://localhost:11050/default" for Hive and "impala-shell" for Impala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org