You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by lec ssmi <sh...@gmail.com> on 2020/08/31 05:33:17 UTC

runtime memory management

HI:
  Generally speaking, when we submitting the flink program, the number of
taskmanager and the memory of each tn will be specified. And the smallest
real execution unit of flink should be operator.
   Since the calculation logic corresponding to each operator is different,
some need to save the state, and some don't.  Therefore, the memory size
required by each operator should be different. How does the flink program
allocate taskmanager memory to the operator by default?
  In our production practice, with the increase of traffic, some operators
(mainly stateful such as join and groupby) often have insufficient memory,
resulting in slower calculations. The usual approach is to increase the
entire taskmanager memory. But will this part of the increased memory be
allocated to the map-like operator, or that the memory itself is fetched on
demand  in the same taskmanager  whoever needs the memory will fetch it
until the memory is used up,  in other words, there is no preset memory
allocation ratio. For a complex streaming job, is there any way to tilt the
memory towards stateful operators?

 Thanks.

Re: runtime memory management

Posted by Xintong Song <to...@gmail.com>.

Well, that's a long story. In general, there are 2 steps.

   1. *Which operators are deployed in the same slot?* Operators are first
   *chained*[1] together, then a *slot sharing strategy*[2] is applied by
   default.
   2. *Which task managers are slots allocated from?*
      1. For active deployments (Kubernetes, Yarn, Mesos), task
      managers are launched on demand. That means ideally you should
not have too
      many empty slots.
      2. For the standalone deployment, by default slots are allocated
      randomly from all registered task managers. You can configure[3] the
      cluster to allocate slots evenly across task managers.


Thank you~

Xintong Song


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#tasks-and-operator-chains
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling.html
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#cluster-evenly-spread-out-slots

On Mon, Aug 31, 2020 at 4:31 PM lec ssmi <sh...@gmail.com> wrote:

> Thanks.
> When the program starts, how is each operator allocated in taskmanager?
> For example, if I have 2 taskmanagers and 10 operators, 9 operators  are
> allocated to tm-A and the remaining one is placed in tm-B, the utilization
> of resources will be very low.
>
> Xintong Song <to...@gmail.com> 于2020年8月31日周一 下午2:45写道：
>
>> Hi,
>>
>> For a complex streaming job, is there any way to tilt the memory towards
>>> stateful operators?
>>
>> If streaming jobs are interested, the quick answer is no. Memory is
>> fetched on demand for all operators.
>>
>> Currently, only managed memory for batch jobs are pre-planned for each
>> operator.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Mon, Aug 31, 2020 at 1:33 PM lec ssmi <sh...@gmail.com> wrote:
>>
>>> HI:
>>>   Generally speaking, when we submitting the flink program, the number
>>> of taskmanager and the memory of each tn will be specified. And the
>>> smallest real execution unit of flink should be operator.
>>>    Since the calculation logic corresponding to each operator is
>>> different, some need to save the state, and some don't.  Therefore, the
>>> memory size required by each operator should be different. How does the
>>> flink program allocate taskmanager memory to the operator by default?
>>>   In our production practice, with the increase of traffic, some
>>> operators (mainly stateful such as join and groupby) often have
>>> insufficient memory, resulting in slower calculations. The usual approach
>>> is to increase the entire taskmanager memory. But will this part of the
>>> increased memory be allocated to the map-like operator, or that the memory
>>> itself is fetched on demand  in the same taskmanager  whoever needs the
>>> memory will fetch it until the memory is used up,  in other words, there is
>>> no preset memory allocation ratio. For a complex streaming job, is there
>>> any way to tilt the memory towards stateful operators?
>>>
>>>  Thanks.
>>>
>>>
>>>
>>>

Re: runtime memory management

Posted by lec ssmi <sh...@gmail.com>.

Thanks.
When the program starts, how is each operator allocated in taskmanager?
For example, if I have 2 taskmanagers and 10 operators, 9 operators  are
allocated to tm-A and the remaining one is placed in tm-B, the utilization
of resources will be very low.

Xintong Song <to...@gmail.com> 于2020年8月31日周一 下午2:45写道：

> Hi,
>
> For a complex streaming job, is there any way to tilt the memory towards
>> stateful operators?
>
> If streaming jobs are interested, the quick answer is no. Memory is
> fetched on demand for all operators.
>
> Currently, only managed memory for batch jobs are pre-planned for each
> operator.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Aug 31, 2020 at 1:33 PM lec ssmi <sh...@gmail.com> wrote:
>
>> HI:
>>   Generally speaking, when we submitting the flink program, the number of
>> taskmanager and the memory of each tn will be specified. And the smallest
>> real execution unit of flink should be operator.
>>    Since the calculation logic corresponding to each operator is
>> different, some need to save the state, and some don't.  Therefore, the
>> memory size required by each operator should be different. How does the
>> flink program allocate taskmanager memory to the operator by default?
>>   In our production practice, with the increase of traffic, some
>> operators (mainly stateful such as join and groupby) often have
>> insufficient memory, resulting in slower calculations. The usual approach
>> is to increase the entire taskmanager memory. But will this part of the
>> increased memory be allocated to the map-like operator, or that the memory
>> itself is fetched on demand  in the same taskmanager  whoever needs the
>> memory will fetch it until the memory is used up,  in other words, there is
>> no preset memory allocation ratio. For a complex streaming job, is there
>> any way to tilt the memory towards stateful operators?
>>
>>  Thanks.
>>
>>
>>
>>

Re: runtime memory management

Posted by Xintong Song <to...@gmail.com>.

Hi,

For a complex streaming job, is there any way to tilt the memory towards
> stateful operators?

If streaming jobs are interested, the quick answer is no. Memory is fetched
on demand for all operators.

Currently, only managed memory for batch jobs are pre-planned for each
operator.

Thank you~

Xintong Song



On Mon, Aug 31, 2020 at 1:33 PM lec ssmi <sh...@gmail.com> wrote:

> HI:
>   Generally speaking, when we submitting the flink program, the number of
> taskmanager and the memory of each tn will be specified. And the smallest
> real execution unit of flink should be operator.
>    Since the calculation logic corresponding to each operator is
> different, some need to save the state, and some don't.  Therefore, the
> memory size required by each operator should be different. How does the
> flink program allocate taskmanager memory to the operator by default?
>   In our production practice, with the increase of traffic, some operators
> (mainly stateful such as join and groupby) often have insufficient memory,
> resulting in slower calculations. The usual approach is to increase the
> entire taskmanager memory. But will this part of the increased memory be
> allocated to the map-like operator, or that the memory itself is fetched on
> demand  in the same taskmanager  whoever needs the memory will fetch it
> until the memory is used up,  in other words, there is no preset memory
> allocation ratio. For a complex streaming job, is there any way to tilt the
> memory towards stateful operators?
>
>  Thanks.
>
>
>
>