You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by snack white <am...@gmail.com> on 2020/07/28 03:42:53 UTC

improve the performance of flink sql job which lookup 40+ table

HI:
      My  flink  version is 1.10  use per-job mode , my   sql like 

```
select
  column1, t2.xx2, t3.xx3,t4.xx4
  …  t40.xx40
from
  main_table 
  left join lookup_1 FOR SYSTEM_TIME AS OF t1.proc_time AS t2 on t1.xx= t2.xx
  left join lookup_2 FOR SYSTEM_TIME AS OF t1.proc_time AS t3 on t1.xx= t3.xx
  left join lookup_3 FOR SYSTEM_TIME AS OF t1.proc_time AS t4 on t1.xx= t4.xx
  left join lookup_4 FOR SYSTEM_TIME AS OF t1.proc_time AS t5 on t1.xx= t5.xx
  left join lookup_5 FOR SYSTEM_TIME AS OF t1.proc_time AS t6 on t1.xx= t6.xx
  left join lookup_6 FOR SYSTEM_TIME AS OF t1.proc_time AS t7 on t1.xx= t7.xx
...

  left join lookup_40 FOR SYSTEM_TIME AS OF t1.proc_time AS t40 on t1.xx= t40.xx
```

I have developed the async lookup feature , but that is not enough， maybe the current look up table is serial not parallelism ?  

Now I need help about how can I  improve the performance of my sql job .

Best 
White

Re: improve the performance of flink sql job which lookup 40+ table

Posted by Jark Wu <im...@gmail.com>.

Hi,

Yes, currently, multiple lookup join is not parallel and execute one by
one.
Async lookup + cache is the suggested way to improve performance.
If the lookup tables are not large, you can also implement a ALL cache for
the LookupTableSource to cache all the data in the database, and reload
periodically.

In Flink 1.12, we will support temporal join changelog [1] which will join
the changelog stream instead of lookup database,
 this will greatly improve the performance and have lowest latency.

Best,
Jark

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL

On Tue, 28 Jul 2020 at 11:43, snack white <am...@gmail.com> wrote:

> HI:
>       My  flink  version is 1.10  use per-job mode , my   sql like
>
> ```
> select
>   column1, t2.xx2, t3.xx3,t4.xx4
>   …  t40.xx40
> from
>   main_table
>   left join lookup_1 FOR SYSTEM_TIME AS OF t1.proc_time AS t2 on t1.xx=
> t2.xx
>   left join lookup_2 FOR SYSTEM_TIME AS OF t1.proc_time AS t3 on t1.xx=
> t3.xx
>   left join lookup_3 FOR SYSTEM_TIME AS OF t1.proc_time AS t4 on t1.xx=
> t4.xx
>   left join lookup_4 FOR SYSTEM_TIME AS OF t1.proc_time AS t5 on t1.xx=
> t5.xx
>   left join lookup_5 FOR SYSTEM_TIME AS OF t1.proc_time AS t6 on t1.xx=
> t6.xx
>   left join lookup_6 FOR SYSTEM_TIME AS OF t1.proc_time AS t7 on t1.xx=
> t7.xx
> ...
>
>   left join lookup_40 FOR SYSTEM_TIME AS OF t1.proc_time AS t40 on t1.xx=
> t40.xx
> ```
>
> I have developed the async lookup feature , but that is not enough， maybe
> the current look up table is serial not parallelism ?
>
> Now I need help about how can I  improve the performance of my sql job .
>
> Best
> White
>
>
>