You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2019/06/21 16:12:00 UTC

[jira] [Comment Edited] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

    [ https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869640#comment-16869640 ] 

Tim Armstrong edited comment on IMPALA-8276 at 6/21/19 4:11 PM:
----------------------------------------------------------------

Yeah, Quanlong was able to fix two issues in this area (he's a brave person). 

Looking at IMPALA-8386, I'm pretty sure this is a dupe of that - the query shapes are the same, except with an inline view vs a regular view. And both result in bogus x = x predicate in the "other predicates" of a join.

{noformat}
| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                |
| |  hash predicates: c.a_id = a_id                           |
| |  other predicates: sum(amount) = sum(amount)      <---------- Wrong inferred predicate which incorrectly reject nulls
| |  runtime filters: RF000 <- a_id                           |
{noformat}
versus
{noformat}
505:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                                
...
|  other predicates: t.date1 = t.date1,
                     t.date2 = t.date2,
                     t.date3 = t.date3
{noformat}

I don't feel 100% confident in closing it though until we've confirmed the fix, it's just too subtle and maybe there's some slight difference.


was (Author: tarmstrong):
Yeah, Quanlong was able to fix two issues in this area (he's a brave person). 

Looking at IMPALA-8386, I'm pretty sure this is a dupe of that - the query shapes are the same, except with an inline view vs a regular view. And both result in bogus x = x predicate in the "other predicates" of a join.

{noformat}
| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                |
| |  hash predicates: c.a_id = a_id                           |
| |  other predicates: sum(amount) = sum(amount)      <---------- Wrong inferred predicate which incorrectly reject nulls
| |  runtime filters: RF000 <- a_id                           |
{noformat}
versus
{noformat}
505:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                                
|  hash predicates: demand_source_line_id = oola.line_id, io_name = ood.organization_name 
|  other predicates: ooha.ordered_date = ooha.ordered_date,
                     oola.promise_date = oola.promise_date,
                     oola.request_date = oola.request_date
{noformat}

I don't feel 100% confident in closing it though until we've confirmed the fix, it's just too subtle and maybe there's some slight difference.

> Self equal to self predicate "x = x" generated by Impala caused incorrect query result
> --------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8276
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8276
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Yongjun Zhang
>            Priority: Blocker
>              Labels: correctness
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = x" is generated by Impala and caused incorrect query result, because this kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS query, though the query body is the same for both, because the former generated the bogus predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from 
> (select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q) a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the query plan in Impala query profile because a and b are aliases of view1 and view2,  both of which are views created in a very nested way that involves table table1. 
> Though in cdh5.12.1 the select and the count query returns different result in the initial case, an attempted reproduction shows that both queries get bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org