You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Boris Lublinsky <bo...@lightbend.com> on 2018/01/13 16:07:30 UTC

Keyed State

Can you, please confirm that my understanding is correct?
I am looking at the documentation on low level joins https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins <https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins>
And the example there.
When we are doing KeyBy and then Process, Flink maintains an instance per key and makes sure that that for a given key an instance for this key is used. Correct?
It mean that the value state for a given key is maintained by Flink and in my code I do not need to worry about a key value.
In my code I can use ValueState and assume that Flink will keep track of it on per key fashion.

Boris Lublinsky
FDP Architect
boris.lublinsky@lightbend.com
https://www.lightbend.com/


Re: Keyed State

Posted by Boris Lublinsky <bo...@lightbend.com>.
Thanks Fabian

Boris Lublinsky
FDP Architect
boris.lublinsky@lightbend.com
https://www.lightbend.com/

> On Jan 13, 2018, at 11:06 AM, Fabian Hueske <fh...@gmail.com> wrote:
> 
> Yes, that is correct.
> You can treat keyed ValueState like a distributed hashmap and Flink routes all state accesses to the entry for the key of the current record.
> 
> 2018-01-13 17:07 GMT+01:00 Boris Lublinsky <boris.lublinsky@lightbend.com <ma...@lightbend.com>>:
> Can you, please confirm that my understanding is correct?
> I am looking at the documentation on low level joins https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins <https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins>
> And the example there.
> When we are doing KeyBy and then Process, Flink maintains an instance per key and makes sure that that for a given key an instance for this key is used. Correct?
> It mean that the value state for a given key is maintained by Flink and in my code I do not need to worry about a key value.
> In my code I can use ValueState and assume that Flink will keep track of it on per key fashion.
> 
> Boris Lublinsky
> FDP Architect
> boris.lublinsky@lightbend.com <ma...@lightbend.com>
> https://www.lightbend.com/ <https://www.lightbend.com/>
> 


Re: Keyed State

Posted by Fabian Hueske <fh...@gmail.com>.
Sure.

A CoProcessFunction is executed in parallel by running multiple instances
of the CoProcessFunction. Each instance runs in a separate TaskManager slot
and is responsible for a subset of all keys. Keys are assigned by hash
partitioning to function instances.

All calls to methods of an individual CoProcessFunction instance are
synchronized, i.e., processElement1, processElement2, and onTimer are never
concurrently called. A long running processElement1 method will block all
method calls for all keys that are assigned to the same instance (not just
method calls for the same key).

2018-01-13 20:33 GMT+01:00 Boris Lublinsky <bo...@lightbend.com>:

> Thanks Fabian
> Can you also explain a thread model?
> What is the paralelization between multiple keys? Is it hash based?
> And also are processElement 1 and 2 are executed on different threads?
> More specifically if processElement is an order of magnitude slower then
> 2, will it impact processElement 2?
>
>
> Boris Lublinsky
> FDP Architect
> boris.lublinsky@lightbend.com
> https://www.lightbend.com/
>
> On Jan 13, 2018, at 11:06 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
> Yes, that is correct.
> You can treat keyed ValueState like a distributed hashmap and Flink routes
> all state accesses to the entry for the key of the current record.
>
> 2018-01-13 17:07 GMT+01:00 Boris Lublinsky <bo...@lightbend.com>
> :
>
>> Can you, please confirm that my understanding is correct?
>> I am looking at the documentation on low level joins
>> https://ci.apache.org/projects/flink/flink-docs-releas
>> e-1.4/dev/stream/operators/process_function.html#low-level-joins
>> And the example there.
>> When we are doing KeyBy and then Process, Flink maintains an instance per
>> key and makes sure that that for a given key an instance for this key is
>> used. Correct?
>> It mean that the value state for a given key is maintained by Flink and
>> in my code I do not need to worry about a key value.
>> In my code I can use ValueState and assume that Flink will keep track of
>> it on per key fashion.
>>
>> Boris Lublinsky
>> FDP Architect
>> boris.lublinsky@lightbend.com
>> https://www.lightbend.com/
>>
>>
>
>

Re: Keyed State

Posted by Boris Lublinsky <bo...@lightbend.com>.
Thanks Fabian
Can you also explain a thread model?
What is the paralelization between multiple keys? Is it hash based?
And also are processElement 1 and 2 are executed on different threads?
More specifically if processElement is an order of magnitude slower then 2, will it impact processElement 2?


Boris Lublinsky
FDP Architect
boris.lublinsky@lightbend.com
https://www.lightbend.com/

> On Jan 13, 2018, at 11:06 AM, Fabian Hueske <fh...@gmail.com> wrote:
> 
> Yes, that is correct.
> You can treat keyed ValueState like a distributed hashmap and Flink routes all state accesses to the entry for the key of the current record.
> 
> 2018-01-13 17:07 GMT+01:00 Boris Lublinsky <boris.lublinsky@lightbend.com <ma...@lightbend.com>>:
> Can you, please confirm that my understanding is correct?
> I am looking at the documentation on low level joins https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins <https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#low-level-joins>
> And the example there.
> When we are doing KeyBy and then Process, Flink maintains an instance per key and makes sure that that for a given key an instance for this key is used. Correct?
> It mean that the value state for a given key is maintained by Flink and in my code I do not need to worry about a key value.
> In my code I can use ValueState and assume that Flink will keep track of it on per key fashion.
> 
> Boris Lublinsky
> FDP Architect
> boris.lublinsky@lightbend.com <ma...@lightbend.com>
> https://www.lightbend.com/ <https://www.lightbend.com/>
> 


Re: Keyed State

Posted by Fabian Hueske <fh...@gmail.com>.
Yes, that is correct.
You can treat keyed ValueState like a distributed hashmap and Flink routes
all state accesses to the entry for the key of the current record.

2018-01-13 17:07 GMT+01:00 Boris Lublinsky <bo...@lightbend.com>:

> Can you, please confirm that my understanding is correct?
> I am looking at the documentation on low level joins
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/
> operators/process_function.html#low-level-joins
> And the example there.
> When we are doing KeyBy and then Process, Flink maintains an instance per
> key and makes sure that that for a given key an instance for this key is
> used. Correct?
> It mean that the value state for a given key is maintained by Flink and in
> my code I do not need to worry about a key value.
> In my code I can use ValueState and assume that Flink will keep track of
> it on per key fashion.
>
> Boris Lublinsky
> FDP Architect
> boris.lublinsky@lightbend.com
> https://www.lightbend.com/
>
>