You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Dan Hill <qu...@gmail.com> on 2021/03/09 17:57:37 UTC

Best practices for complex state manipulation

Hi!

I'm working on a join setup that does fuzzy matching in case the client
does not send enough parameters to join by a foreign key.  There's a few
ways I can store the state.  I'm curious about best practices around this.
I'm using rocksdb as the state storage.

I was reading the code for IntervalJoin
<https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/co/IntervalJoinOperator.java>
and was a little shocked by the implementation.  It feels designed for very
short join intervals.

I read this set of pages
<https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html>
but I'm looking for one level deeper.  E.g. what are performance
characteristics of different types of state crud operations with rocksdb?
E.g. I could create extra MapState to act as an index.  When is this worth
it?

Re: Best practices for complex state manipulation

Posted by Dan Hill <qu...@gmail.com>.
Thanks Gordon and Seth!

On Wed, Mar 10, 2021, 21:55 Tzu-Li (Gordon) Tai <tz...@apache.org> wrote:

> Hi Dan,
>
> For a deeper dive into state backends and how they manage state, or
> performance critical aspects such as state serialization and choosing
> appropriate state structures, I highly recommend starting from this webinar
> done by my colleague Seth Weismann:
> https://www.youtube.com/watch?v=9GF8Hwqzwnk.
>
> Cheers,
> Gordon
>
> On Wed, Mar 10, 2021 at 1:58 AM Dan Hill <qu...@gmail.com> wrote:
>
>> Hi!
>>
>> I'm working on a join setup that does fuzzy matching in case the client
>> does not send enough parameters to join by a foreign key.  There's a few
>> ways I can store the state.  I'm curious about best practices around this.
>> I'm using rocksdb as the state storage.
>>
>> I was reading the code for IntervalJoin
>> <https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/co/IntervalJoinOperator.java>
>> and was a little shocked by the implementation.  It feels designed for very
>> short join intervals.
>>
>> I read this set of pages
>> <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html>
>> but I'm looking for one level deeper.  E.g. what are performance
>> characteristics of different types of state crud operations with rocksdb?
>> E.g. I could create extra MapState to act as an index.  When is this worth
>> it?
>>
>>
>>

Re: Best practices for complex state manipulation

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi Dan,

For a deeper dive into state backends and how they manage state, or
performance critical aspects such as state serialization and choosing
appropriate state structures, I highly recommend starting from this webinar
done by my colleague Seth Weismann:
https://www.youtube.com/watch?v=9GF8Hwqzwnk.

Cheers,
Gordon

On Wed, Mar 10, 2021 at 1:58 AM Dan Hill <qu...@gmail.com> wrote:

> Hi!
>
> I'm working on a join setup that does fuzzy matching in case the client
> does not send enough parameters to join by a foreign key.  There's a few
> ways I can store the state.  I'm curious about best practices around this.
> I'm using rocksdb as the state storage.
>
> I was reading the code for IntervalJoin
> <https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/co/IntervalJoinOperator.java>
> and was a little shocked by the implementation.  It feels designed for very
> short join intervals.
>
> I read this set of pages
> <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html>
> but I'm looking for one level deeper.  E.g. what are performance
> characteristics of different types of state crud operations with rocksdb?
> E.g. I could create extra MapState to act as an index.  When is this worth
> it?
>
>
>