You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "godfrey he (JIRA)" <ji...@apache.org> on 2019/02/25 07:40:00 UTC

[jira] [Updated] (FLINK-11714) Add cost model for both batch and streaming

     [ https://issues.apache.org/jira/browse/FLINK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

godfrey he updated FLINK-11714:
-------------------------------
    Description: 
Calcite's default cost model only contains ROWS, IO and CPU, and does not take IO and CPU into account when the cost is compared.

There are two improvements:

1. Add NETWORK and MEMORY to represents distribution cost and memory usage.

2. The optimization goal is to use minimal resources now, so the comparison order of factors is:
    (1). first compare CPU. Each operator will use CPU, so we think it's the most important factor.
    (2). then compare MEMORY, NETWORK and IO as a normalized value. Comparison order of them is not easy to decide, so convert them to CPU cost by different ratio.
    (3). finally compare ROWS. ROWS has been counted when calculating other factory.
         e.g. CPU of Sort = nLogN(ROWS) * number of sort keys, CPU of Filter = ROWS * condition cost on a row.

  was:
Calcite's default cost model only contains ROWS, IO and CPU, and does not take IO and CPU into account when the cost is compared.

There are two improvements:

1. Add NETWORK and MEMORY to represents distribution cost and memory usage.

2. compare CPU value first, because each operator will use CPU. compare ROWS value last, because ROWS has been counted when calculating other values. e.g. CPU of Sort = nLogN(ROWS) * number of sort keys.

 


> Add cost model for both batch and streaming
> -------------------------------------------
>
>                 Key: FLINK-11714
>                 URL: https://issues.apache.org/jira/browse/FLINK-11714
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API &amp; SQL
>            Reporter: godfrey he
>            Assignee: godfrey he
>            Priority: Major
>
> Calcite's default cost model only contains ROWS, IO and CPU, and does not take IO and CPU into account when the cost is compared.
> There are two improvements:
> 1. Add NETWORK and MEMORY to represents distribution cost and memory usage.
> 2. The optimization goal is to use minimal resources now, so the comparison order of factors is:
>     (1). first compare CPU. Each operator will use CPU, so we think it's the most important factor.
>     (2). then compare MEMORY, NETWORK and IO as a normalized value. Comparison order of them is not easy to decide, so convert them to CPU cost by different ratio.
>     (3). finally compare ROWS. ROWS has been counted when calculating other factory.
>          e.g. CPU of Sort = nLogN(ROWS) * number of sort keys, CPU of Filter = ROWS * condition cost on a row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)