You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/11/01 21:39:01 UTC

[jira] [Commented] (IMPALA-7586) Incorrect results when querying primary = "\"" in Kudu and HBase

    [ https://issues.apache.org/jira/browse/IMPALA-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672260#comment-16672260 ] 

ASF subversion and git services commented on IMPALA-7586:
---------------------------------------------------------

Commit 95b56d0e2d8232d8707603c360b98a35bb80ff3a in impala's branch refs/heads/master from [~tarmstrong@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=95b56d0 ]

IMPALA-7586: fix predicate pushdown of escaped strings

This fixes a class of bugs where the planner incorrectly uses the raw
string from the parser instead of the unescaped string. This occurs in
several places that push predicates down to the storage layer:
* Kudu scans
* HBase scans
* Data source scans

There are some more complex issues with escapes and the LIKE predicate
that are tracked separately by IMPALA-2422.

This also uncovered a different issue with RCFiles that is tracked by
IMPALA-7778 and is worked around by the tests added.

In order to make bugs like this more obvious in future, I renamed
getValue() to getValueWithOriginalEscapes().

Testing:
Added regression test that tests handling of backslash escapes on all
file formats. I did not add a regression test for the data source bug
since it seems to require some major modification of the data source
test infrastructure.

Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Reviewed-on: http://gerrit.cloudera.org:8080/11814
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Incorrect results when querying primary = "\"" in Kudu and HBase
> ----------------------------------------------------------------
>
>                 Key: IMPALA-7586
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7586
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Will Berkeley
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: correctness, kudu
>         Attachments: impalakudu_pred_bug.profile
>
>
> Version string from catalogd web ui:
> {noformat}
> catalogd version 3.1.0-cdh6.x-SNAPSHOT RELEASE (build 8baac7f5849b6bacb02fedeb9b3fe2b2ee9450ee)
> {noformat}
> A reproduction script for the impala-shell:
> {noformat}
> create table test(name string, primary key(name) ) stored as kudu;
> insert into test values ("\"");
> -- Modified 1 row(s), 0 row error(s) in 4.01s
> -- row found in full table scan
> select * from test;
> -- Fetched 1 row(s) in 0.15s
> -- row not found on = predicate (pushed to kudu)
> select * from test where name="\"";
> -- Fetched 0 row(s) in 0.13s
> -- row found when predicate cannot be pushed to kudu
> select * from test where name like "\"";
> -- Fetched 1 row(s) in 0.13s
> {noformat}
> This was originally reported as KUDU-2575. I tried to reproduce directly against Kudu using the python client but got the expected result.
> From the plan and profile, Impala is pushing down the predicate, but Kudu is not being scanned, possibly because the Kudu client short-circuits the scan as having no results based on the predicate Impala pushes down.
> {noformat}
> 00:SCAN KUDU [default.test]
>    kudu predicates: name = '"'
>    mem-estimate=0B mem-reservation=0B thread-reservation=1
>    tuple-ids=0 row-size=15B cardinality=unavailable
>    in pipelines: 00(GETNEXT)
> {noformat}
> {noformat}
> KUDU_SCAN_NODE (id=0)
>           - AverageScannerThreadConcurrency: 0.00 (0.0)
>           - InactiveTotalTime: 0ns (0)
>           - KuduRemoteScanTokens: 0 (0)
>           - MaterializeTupleTime(*): 0ns (0)
>           - NumScannerThreadMemUnavailable: 0 (0)
>           - NumScannerThreadsStarted: 1 (1)
>           - PeakMemoryUsage: 24.0 KiB (24576)
>           - PeakScannerThreadConcurrency: 1 (1)
>           - RowBatchBytesEnqueued: 16.0 KiB (16384)
>           - RowBatchQueueGetWaitTime: 0ns (0)
>           - RowBatchQueuePeakMemoryUsage: 0 B (0)
>           - RowBatchQueuePutWaitTime: 0ns (0)
>           - RowBatchesEnqueued: 1 (1)
>           - RowsRead: 0 (0)
> ===>  - RowsReturned: 0 (0)
>           - RowsReturnedRate: 0 per second (0)
>           - ScanRangesComplete: 1 (1)
>           - ScannerThreadsInvoluntaryContextSwitches: 0 (0)
>           - ScannerThreadsTotalWallClockTime: 0ns (0)
>             - ScannerThreadsSysTime: 158.00us (158000)
>             - ScannerThreadsUserTime: 0ns (0)
>           - ScannerThreadsVoluntaryContextSwitches: 2 (2)
> ===>  - TotalKuduScanRoundTrips: 0 (0)
>           - TotalTime: 1ms (1999972)
> {noformat}
> I also confirmed Kudu sees no scan from Impala for this query using the /scans page of the tablet servers.
> Full profile attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org