You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Xintong Song (Jira)" <ji...@apache.org> on 2023/03/23 08:52:15 UTC

[jira] [Updated] (FLINK-28120) Meet assert error: BatchPhysicalExchange.BATCH_PHYSICAL has lower cost then best cost of subset :RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]]

     [ https://issues.apache.org/jira/browse/FLINK-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xintong Song updated FLINK-28120:
---------------------------------
    Fix Version/s: 1.18.0
                       (was: 1.17.0)

> Meet assert error: BatchPhysicalExchange.BATCH_PHYSICAL has lower cost then  best cost  of subset :RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]]
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28120
>                 URL: https://issues.apache.org/jira/browse/FLINK-28120
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>            Reporter: luoyuxia
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.18.0
>
>         Attachments: 截屏2022-06-18 上午11.48.46.png
>
>
> When I run the following sql with Hive dialect,
>  
> {code:java}
> create table src(key string, value string);
> SELECT key, value FROM
> (
>   SELECT key, value FROM src
>   UNION ALL
>   SELECT key, key as value FROM ( 
>     SELECT distinct key FROM (
>       SELECT key, value FROM (
>         SELECT key, value FROM src
>         UNION ALL
>         SELECT key, value FROM src
>       )t1 
>     group by key, value)t2
>   )t3
> )t4
> group by key, value {code}
>  
>  
> it'll throw the excpetion 
>  
> {code:java}
> Caused by: java.lang.AssertionError: rel [rel#1507:BatchPhysicalExchange.BATCH_PHYSICAL.hash[0, 1]true.[](input=RelSubset#999,distribution=hash[key, value])] has lower cost {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 3.394292742113678E9 network, 4.944093593596532E9 memory} than best cost {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 3.3942927421136775E9 network, 4.944093593596532E9 memory} of subset [rel#1103:RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]] {code}
> And then I check the Flink code in where it's thrown, I find it's in 
>  
> {code:java}
> if (relCost.isLt(subset.bestCost)) {
>   return litmus.fail("rel [{}] has lower cost {} than "
>           + "best cost {} of subset [{}]",
>           rel, relCost, subset.bestCost, subset);
> } {code}
> It seems the relCost is less than best cost, so the excpetion throw.
> But the relCost is actually greater than the best cost, shown as follows:
> !截屏2022-06-18 上午11.48.46.png|width=391,height=268!
>  
> It seems the logic in Flink cost comparison breaks.
> Then, I find the method #isLt in FlinkCost, which depend on #isLe and #equals. But #isLe  use normalizeCost, #equals doesn't use normalizeCost, which bring such incosistent.
> For such case, the normalizeCost if  relCost and bestCost will be same. Althogh the network isn't same,  they will end with be same when calculated as a normalizeCost, which seems like precison loss in double.
> So #isLe will be true, but in method #equals, it will compare io, nework, memory separately, which result in false. Then #isLt  = #isLe(other) && !#equals(other) will be true, which bring such exceptioin.
> To fix it, I think we should change the logic for #equals to make it consistent with what we use to compare in #isLe.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)