You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by dz902 <dz...@gmail.com> on 2022/03/28 04:26:58 UTC

Where is the "Partitioned All Cache" doc?

Hi,

I've read some docs
(https://help.aliyun.com/document_detail/182011.html) stating Flink
optimization technique using:

- partitionedJoin = 'true'
- cache = 'ALL'
- blink.partialAgg.enabled=true

However I could not find any official doc references. Are these
supported at all?

Also "partitionedJoin" seemed to have the effect of shuffling input by
joining key so they can fit into memory. I read this
(https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html)
and believes this is already a default behavior of Flink.

Is this optimization not needed even for huge input tables?

Thanks,
Dai

Re: Where is the "Partitioned All Cache" doc?

Posted by dz902 <dz...@gmail.com>.

This is interesting. Thanks for the clarification!

On Mon, Mar 28, 2022 at 4:09 PM Qingsheng Ren <re...@gmail.com> wrote:
>
> Hi,
>
> The optimization you mentioned is only applicable for the product provided by Alibaba Cloud. In open-source Apache Flink there isn’t a unique caching abstraction for all lookup tables, and each connector has there own cache implementation. For example JDBC uses Guava cache and FileSystem uses in-memory HashMap, and both of them don’t load all records in dim table into the cache.
>
> Best,
>
> Qingsheng
>
>
> > On Mar 28, 2022, at 12:26, dz902 <dz...@gmail.com> wrote:
> >
> > Hi,
> >
> > I've read some docs
> > (https://help.aliyun.com/document_detail/182011.html) stating Flink
> > optimization technique using:
> >
> > - partitionedJoin = 'true'
> > - cache = 'ALL'
> > - blink.partialAgg.enabled=true
> >
> > However I could not find any official doc references. Are these
> > supported at all?
> >
> > Also "partitionedJoin" seemed to have the effect of shuffling input by
> > joining key so they can fit into memory. I read this
> > (https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html)
> > and believes this is already a default behavior of Flink.
> >
> > Is this optimization not needed even for huge input tables?
> >
> > Thanks,
> > Dai
>

Re: Where is the "Partitioned All Cache" doc?

Posted by Qingsheng Ren <re...@gmail.com>.

Hi, 

The optimization you mentioned is only applicable for the product provided by Alibaba Cloud. In open-source Apache Flink there isn’t a unique caching abstraction for all lookup tables, and each connector has there own cache implementation. For example JDBC uses Guava cache and FileSystem uses in-memory HashMap, and both of them don’t load all records in dim table into the cache. 

Best, 

Qingsheng

> On Mar 28, 2022, at 12:26, dz902 <dz...@gmail.com> wrote:
> 
> Hi,
> 
> I've read some docs
> (https://help.aliyun.com/document_detail/182011.html) stating Flink
> optimization technique using:
> 
> - partitionedJoin = 'true'
> - cache = 'ALL'
> - blink.partialAgg.enabled=true
> 
> However I could not find any official doc references. Are these
> supported at all?
> 
> Also "partitionedJoin" seemed to have the effect of shuffling input by
> joining key so they can fit into memory. I read this
> (https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html)
> and believes this is already a default behavior of Flink.
> 
> Is this optimization not needed even for huge input tables?
> 
> Thanks,
> Dai