You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2023/03/18 19:45:00 UTC

[jira] [Assigned] (IMPALA-12006) Left outer join cardinality highly overestimated

     [ https://issues.apache.org/jira/browse/IMPALA-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Sinha reassigned IMPALA-12006:
-----------------------------------

    Assignee: Aman Sinha

> Left outer join cardinality highly overestimated
> ------------------------------------------------
>
>                 Key: IMPALA-12006
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12006
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.2.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>
> In one of the use cases, we have seen the cardinality estimate for left outer join highly overestimated.  The plan is complex and only a partial output is shown below (with the column names anonymized): 
> {noformat}
>   57:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
>   |  hash-table-id=121
>   |  hash predicates: a.id = a.id
>   |  fk/pk conjuncts: none
>   |  mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
>   |  tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N,29N,31N row-size=2.63KB cardinality=3.90T
>   |  in pipelines: 06(GETNEXT), 26(OPEN)
>   |
>   |--F1253:PLAN FRAGMENT hosts=13 instances=13
>   |  |  Per-Instance Resources: mem-estimate=1.10GB mem-reservation=204.00MB thread-reservation=1
>   |  JOIN BUILD
>   |  |  join-table-id=121 plan-id=122 cohort-id=25
>   |  |  build expressions: a.id
>   |  |  mem-estimate=1.08GB mem-reservation=204.00MB spill-buffer=2.00MB thread-reservation=0
>   |  |
>   |  1758:EXCHANGE [BROADCAST]
>   |  |  mem-estimate=20.87MB mem-reservation=0B thread-reservation=0
>   |  |  tuple-ids=29,31 row-size=85B cardinality=9.56M
>   |  |  in pipelines: 26(GETNEXT)
>   |  |
>   ...
>   ...
>   ...
>   56:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
>   |  hash predicates: ifnull(a.id, a.id) = a.id
>   |  fk/pk conjuncts: assumed fk/pk
>   |  mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
>   |  tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N row-size=2.55KB cardinality=14.97G
>   |  in pipelines: 06(GETNEXT), 22(OPEN)
> {noformat}
> Note that the left input of the join is estimated as 14.97G rows, right input as 9.56M rows but the LOJ estimate is 3.9T rows.  We need to investigate why that is so and fix it.  The NDV of the based column involved in the join is 36661  but in the lower join there are functions involved in the join condition. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org