You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yun Gao (JIRA)" <ji...@apache.org> on 2019/06/17 08:53:00 UTC
[jira] [Updated] (FLINK-12171) The network buffer memory size should not be checked against the heap size on the TM side

     [ https://issues.apache.org/jira/browse/FLINK-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yun Gao updated FLINK-12171:
----------------------------
    Description: 
Currently when computing the network buffer memory size on the TM side in _TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master), the computed network buffer memory size is checked to be less than `maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the maximum heap memory (namely -Xmx) .

 

With the above process, when TM starts, -Xmx is computed in RM or in _taskmanager.sh_ with (container memory - network buffer memory - managed memory),  thus the above checking implies that the heap memory of the TM must be larger than the network memory, which seems to be not necessary.

 

This may cause TM to use more memory than expected. For example, for a job who has a large network throughput, uses may configure network memory to 2G. However, if users want to assign 1G to heap memory, the TM will fail to start, and user has to allocate at least 2G heap memory (in other words, 4G in total for the TM instead of 3G) to make the TM runnable. This may cause resource inefficiency.

 

Therefore, I think the network buffer memory size also need to be checked against the total memory instead of the heap memory on the TM  side:
 # Checks that networkBufFraction < 1.0.
 # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)).
 # Compare the network buffer memory with the total memory.

This checking is also consistent with the similar one done on the RM side.

  was:
Currently when computing the network buffer memory size on the TM side in _TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master), the computed network buffer memory size is checked to be less than `maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the maximum heap memory (namely -Xmx) .

 

With the above process, when TM starts, -Xmx is computed in RM or in _taskmanager.sh_ with (container memory - network buffer memory - managed memory),  thus the above checking implies that the heap memory of the TM must be larger than the network memory, which seems to be not necessary.

 

 

Therefore, I think the network buffer memory size also need to be checked against the total memory instead of the heap memory on the TM  side:
 # Checks that networkBufFraction < 1.0.
 # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)).
 # Compare the network buffer memory with the total memory.

This checking is also consistent with the similar one done on the RM side.


> The network buffer memory size should not be checked against the heap size on the TM side
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-12171
>                 URL: https://issues.apache.org/jira/browse/FLINK-12171
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.7.2, 1.8.0
>         Environment: Flink-1.7.2, and Flink-1.8 seems have not modified the logic here.
>  
>            Reporter: Yun Gao
>            Assignee: Yun Gao
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when computing the network buffer memory size on the TM side in _TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master), the computed network buffer memory size is checked to be less than `maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the maximum heap memory (namely -Xmx) .
>  
> With the above process, when TM starts, -Xmx is computed in RM or in _taskmanager.sh_ with (container memory - network buffer memory - managed memory),  thus the above checking implies that the heap memory of the TM must be larger than the network memory, which seems to be not necessary.
>  
> This may cause TM to use more memory than expected. For example, for a job who has a large network throughput, uses may configure network memory to 2G. However, if users want to assign 1G to heap memory, the TM will fail to start, and user has to allocate at least 2G heap memory (in other words, 4G in total for the TM instead of 3G) to make the TM runnable. This may cause resource inefficiency.
>  
> Therefore, I think the network buffer memory size also need to be checked against the total memory instead of the heap memory on the TM  side:
>  # Checks that networkBufFraction < 1.0.
>  # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)).
>  # Compare the network buffer memory with the total memory.
> This checking is also consistent with the similar one done on the RM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)