You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Xintong Song (Jira)" <ji...@apache.org> on 2023/03/23 08:52:15 UTC
[jira] [Updated] (FLINK-28120) Meet assert error: BatchPhysicalExchange.BATCH_PHYSICAL has lower cost then best cost of subset :RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]]
[ https://issues.apache.org/jira/browse/FLINK-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xintong Song updated FLINK-28120:
---------------------------------
Fix Version/s: 1.18.0
(was: 1.17.0)
> Meet assert error: BatchPhysicalExchange.BATCH_PHYSICAL has lower cost then best cost of subset :RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]]
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-28120
> URL: https://issues.apache.org/jira/browse/FLINK-28120
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Planner
> Reporter: luoyuxia
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.18.0
>
> Attachments: 截屏2022-06-18 上午11.48.46.png
>
>
> When I run the following sql with Hive dialect,
>
> {code:java}
> create table src(key string, value string);
> SELECT key, value FROM
> (
> SELECT key, value FROM src
> UNION ALL
> SELECT key, key as value FROM (
> SELECT distinct key FROM (
> SELECT key, value FROM (
> SELECT key, value FROM src
> UNION ALL
> SELECT key, value FROM src
> )t1
> group by key, value)t2
> )t3
> )t4
> group by key, value {code}
>
>
> it'll throw the excpetion
>
> {code:java}
> Caused by: java.lang.AssertionError: rel [rel#1507:BatchPhysicalExchange.BATCH_PHYSICAL.hash[0, 1]true.[](input=RelSubset#999,distribution=hash[key, value])] has lower cost {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 3.394292742113678E9 network, 4.944093593596532E9 memory} than best cost {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 3.3942927421136775E9 network, 4.944093593596532E9 memory} of subset [rel#1103:RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]] {code}
> And then I check the Flink code in where it's thrown, I find it's in
>
> {code:java}
> if (relCost.isLt(subset.bestCost)) {
> return litmus.fail("rel [{}] has lower cost {} than "
> + "best cost {} of subset [{}]",
> rel, relCost, subset.bestCost, subset);
> } {code}
> It seems the relCost is less than best cost, so the excpetion throw.
> But the relCost is actually greater than the best cost, shown as follows:
> !截屏2022-06-18 上午11.48.46.png|width=391,height=268!
>
> It seems the logic in Flink cost comparison breaks.
> Then, I find the method #isLt in FlinkCost, which depend on #isLe and #equals. But #isLe use normalizeCost, #equals doesn't use normalizeCost, which bring such incosistent.
> For such case, the normalizeCost if relCost and bestCost will be same. Althogh the network isn't same, they will end with be same when calculated as a normalizeCost, which seems like precison loss in double.
> So #isLe will be true, but in method #equals, it will compare io, nework, memory separately, which result in false. Then #isLt = #isLe(other) && !#equals(other) will be true, which bring such exceptioin.
> To fix it, I think we should change the logic for #equals to make it consistent with what we use to compare in #isLe.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)