You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Zoltán Borók-Nagy (JIRA)" <ji...@apache.org> on 2018/03/02 18:48:00 UTC
[jira] [Resolved] (IMPALA-6258) Uninitialized tuple pointers in row
batch for empty rows
[ https://issues.apache.org/jira/browse/IMPALA-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zoltán Borók-Nagy resolved IMPALA-6258.
---------------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.12.0
Fixed in https://github.com/apache/impala/commit/0eaab69fff82a62fbddaae8a0d4ee7a4302ee715
> Uninitialized tuple pointers in row batch for empty rows
> --------------------------------------------------------
>
> Key: IMPALA-6258
> URL: https://issues.apache.org/jira/browse/IMPALA-6258
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.11.0
> Reporter: Michael Ho
> Assignee: Zoltán Borók-Nagy
> Priority: Critical
> Labels: correctness
> Fix For: Impala 2.12.0
>
>
> During [code review|https://gerrit.cloudera.org/#/c/8623/] of IMPALA-6187, it was noticed that the tuple pointers in the generated row batches may not be initialized if a tuple has byte size 0. It's unclear if there may be edge cases in which the code may be de-referencing these uninitialized tuple pointers. In addition, there are some codes which compare these uninitialized pointers agains the NULL value so having them uninitialized may return wrong (and non-deterministic) results:
> {noformat}
> BooleanVal TupleIsNullPredicate::GetBooleanVal(
> ScalarExprEvaluator* evaluator, const TupleRow* row) const {
> int count = 0;
> for (int i = 0; i < tuple_idxs_.size(); ++i) {
> count += row->GetTuple(tuple_idxs_[i]) == NULL;
> }
> // Return true only if all originally specified tuples are NULL. Return false if any
> // tuple is non-nullable.
> return BooleanVal(count == tuple_ids_.size());
> }
> {noformat}
> [~tarmstrong] came up with the following example:
> {noformat}
> SELECT /* +straight_join */ COUNT(t1.id)
> FROM functional.alltypessmall t1
> LEFT OUTER JOIN (
> SELECT /* +straight_join */ IFNULL(t2.int_col, 1) AS c
> FROM functional.alltypessmall t2
> LEFT OUTER JOIN functional.alltypestiny t3 ON t2.id < 1000
> ) v ON t1.int_col = v.c;
> The relevant part of the plan is:
> | 04:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] |
> | | hash predicates: t1.int_col = if(TupleIsNull(1, 2), NULL, ifnull(t2.int_col, 1)) |
> | | fk/pk conjuncts: assumed fk/pk |
> | | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB |
> | | tuple-ids=0,1N,2N row-size=16B cardinality=100 |
> | | |
> | |--08:EXCHANGE [HASH(if(TupleIsNull(1, 2), NULL, ifnull(t2.int_col, 1)))] |
> | | | mem-estimate=0B mem-reservation=0B |
> | | | tuple-ids=1,2N row-size=8B cardinality=100 |
> | | | |
> | | F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 |
> | | Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B |
> | | 03:NESTED LOOP JOIN [LEFT OUTER JOIN, BROADCAST] |
> | | | join predicates: t2.id < 1000 |
> | | | mem-estimate=0B mem-reservation=0B |
> | | | tuple-ids=1,2N row-size=8B cardinality=100 |
> | | | |
> | | |--06:EXCHANGE [BROADCAST] |
> | | | | mem-estimate=0B mem-reservation=0B |
> | | | | tuple-ids=2 row-size=0B cardinality=8 |
> | | | | |
> | | | F02:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 |
> | | | Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B |
> | | | 02:SCAN HDFS [functional.alltypestiny t3, RANDOM] |
> | | | partitions=4/4 files=4 size=460B |
> | | | stats-rows=8 extrapolated-rows=disabled |
> | | | table stats: rows=8 size=unavailable |
> | | | column stats: all |
> | | | mem-estimate=32.00MB mem-reservation=0B |
> | | | tuple-ids=2 row-size=0B cardinality=8 |
> | | | |
> | | 01:SCAN HDFS [functional.alltypessmall t2, RANDOM] |
> | | partitions=4/4 files=4 size=6.32KB |
> | | stats-rows=100 extrapolated-rows=disabled |
> | | table stats: rows=100 size=unavailable |
> | | column stats: all |
> | | mem-estimate=32.00MB mem-reservation=0B |
> | | tuple-ids=1 row-size=8B cardinality=100 |
>
> {noformat}
> We should fix them by setting these empty tuples with a dummy non-NULL pointer.
> Alex came up with this query that produces non-deterministic results currently:
> {noformat}
> select count(v.x) from functional.alltypestiny t3 left outer join (select true as x from functional.alltypestiny t1 left outer join functional.alltypestiny t2 on (true)) v on (v.x = t3.bool_col) where t3.bool_col = true;
> {noformat}
> {noformat}
> select count(v.x) from functional_kudu.alltypestiny t3 left outer join (select true as x from functional_kudu.alltypestiny t1 left outer join functional_kudu.alltypestiny t2 on (true)) v on (v.x = t3.bool_col) where t3.bool_col = true;
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)