You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Xintong Song (Jira)" <ji...@apache.org> on 2020/05/27 08:40:00 UTC

[jira] [Commented] (FLINK-17958) Kubernetes session constantly allocates taskmanagers after cancel a job

    [ https://issues.apache.org/jira/browse/FLINK-17958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117515#comment-17117515 ] 

Xintong Song commented on FLINK-17958:
--------------------------------------

True. I agree that the bug in {{MathUtils#divideRoundUp}} is the cause of the problem.

I think we should do the following things.
* Make {{MathUtils#divideRoundUp}} return 0 when {{dividend}} is 0.
* Check that {{dividend >= 0 && divisor > 0}}, and throw an exception otherwise.
* Add more test cases for {{MathUtils#divideRoundUp}}.
* In {{WorkerSpecContainerResourceAdapter#normalize}}, we should return {{unitValue}} in case {{MathUtils#divideRoundUp}} returns 0.

> Kubernetes session constantly allocates taskmanagers after cancel a job
> -----------------------------------------------------------------------
>
>                 Key: FLINK-17958
>                 URL: https://issues.apache.org/jira/browse/FLINK-17958
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.0, 1.12.0
>            Reporter: Yang Wang
>            Priority: Blocker
>             Fix For: 1.11.0
>
>
> When i am testing the {{kubernetes-session.sh}}, i find that the {{KubernetesResourceManager}} will constantly allocate taskmanager after cancel a job. I think it may be caused by a bug of the following code. When the {{dividend}} is 0 and {{divisor}} is bigger than 1, the return value will be 1. However, we expect it to be 0.
> {code:java}
> /**
>  * Divide and rounding up to integer.
>  * E.g., divideRoundUp(3, 2) returns 2.
>  * @param dividend value to be divided by the divisor
>  * @param divisor value by which the dividend is to be divided
>  * @return the quotient rounding up to integer
>  */
> public static int divideRoundUp(int dividend, int divisor) {
>    return (dividend - 1) / divisor + 1;
> }{code}
>  
> How to reproduce this issue?
>  # Start a Kubernetes session
>  # Submit a Flink job to the existing session
>  # Cancel the job and wait for the TaskManager released via idle timeout
>  # More and more TaskManagers will be allocated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)