You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by narasimha <sw...@gmail.com> on 2020/08/10 07:36:30 UTC

TaskManagers are still up even after job execution completed in PerJob deployment mode

I'm trying out Flink Per-Job deployment using docker-compose.

Configurations:

version: "2.2"
jobs:
  jobmanager:
    build: ./
    image: flink_local:1.1
    ports:
      - "8081:8081"
    command: standalone-job --job-classname com.organization.BatchJob
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
        parallelism.default: 2
  taskmanager:
    image: flink_local:1.1
    depends_on:
      - jobmanager
    command: taskmanager
    scale: 1
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
        taskmanager.numberOfTaskSlots: 2
        parallelism.default: 2

Flink image is extended with job.jar, Job executed successfully.

JobManager exited after the job is completed, but is still running, which
is not expected.

Any configurations have to be added to exit both JobManager and TaskManger?

Versions:

Flink - 1.11.0

Java - 1.8

-- 
A.Narasimha Swamy

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

Posted by narasimha <sw...@gmail.com>.
Thanks, Till.

Currently, the instance is getting timeout error and terminating the
TaskManager.

Sure, will try native K8s.

On Thu, Aug 13, 2020 at 3:12 PM Till Rohrmann <tr...@apache.org> wrote:

> Hi Narasimha,
>
> if you are deploying the Flink cluster manually on K8s then there is
> no automatic way of stopping the TaskExecutor/TaskManager pods. This is
> something you have to do manually (similar to a standalone deployment). The
> only clean up mechanism is the automatic termination of the TaskManager
> processes if they cannot connect to the ResourceManager after the specified
> timeout. However, you can use Flink's native K8s integration with which you
> can also deploy a per-job mode cluster [1]. The native K8s integration is
> able to clean up the whole cluster.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html
>
> Cheers,
> Till
>
> On Thu, Aug 13, 2020 at 11:26 AM Kostas Kloudas <kk...@gmail.com>
> wrote:
>
>> Hi Narasimha,
>>
>> I am not sure why the TMs are not shutting down, as Yun said, so I am
>> cc'ing Till here as he may be able to shed some light.
>> For the application mode, the page in the documentation that you
>> pointed is the recommended way to deploy an application in application
>> mode.
>>
>> Cheers,
>> Kostas
>>
>> On Mon, Aug 10, 2020 at 11:16 AM narasimha <sw...@gmail.com>
>> wrote:
>> >
>> > Thanks, Yun for the prompt reply.
>> >
>> > TaskManager was actively looking for ResourceManager, on timeout of 5
>> mins it got terminated.
>> >
>> > Any recommendations around this? Or is this the way this will work.
>> >
>> > What should be done around this to make the application start in
>> application deployment mode?
>> >
>> >
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#start-flink-application
>> >
>> > Here it has shown to invoke Flink binary to start. Is this the
>> preferred way?
>> >
>> >
>> > On Mon, Aug 10, 2020 at 1:46 PM Yun Tang <my...@live.com> wrote:
>> >>
>> >> Hi
>> >>
>> >> From your description, the task managers are still alive even the job
>> is finished and job manager has shut down?
>> >> If so, I think this is really weird, could you check what the TM is
>> doing via jstack and the logs in job manager and idle task manager?
>> >> The task manager should be released when the JM is shutting down.
>> >> Moreover, idle task manager would also release after 30 seconds by
>> default [1].
>> >>
>> >>
>> >> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#resourcemanager-taskmanager-timeout
>> >>
>> >> Best
>> >> Yun Tang
>> >>
>> >>
>> >> ________________________________
>> >> From: narasimha <sw...@gmail.com>
>> >> Sent: Monday, August 10, 2020 15:36
>> >> To: user@flink.apache.org <us...@flink.apache.org>
>> >> Subject: TaskManagers are still up even after job execution completed
>> in PerJob deployment mode
>> >>
>> >>
>> >> I'm trying out Flink Per-Job deployment using docker-compose.
>> >>
>> >> Configurations:
>> >>
>> >> version: "2.2"
>> >> jobs:
>> >>   jobmanager:
>> >>     build: ./
>> >>     image: flink_local:1.1
>> >>     ports:
>> >>       - "8081:8081"
>> >>     command: standalone-job --job-classname com.organization.BatchJob
>> >>     environment:
>> >>       - |
>> >>         FLINK_PROPERTIES=
>> >>         jobmanager.rpc.address: jobmanager
>> >>         parallelism.default: 2
>> >>   taskmanager:
>> >>     image: flink_local:1.1
>> >>     depends_on:
>> >>       - jobmanager
>> >>     command: taskmanager
>> >>     scale: 1
>> >>     environment:
>> >>       - |
>> >>         FLINK_PROPERTIES=
>> >>         jobmanager.rpc.address: jobmanager
>> >>         taskmanager.numberOfTaskSlots: 2
>> >>         parallelism.default: 2
>> >>
>> >> Flink image is extended with job.jar, Job executed successfully.
>> >>
>> >> JobManager exited after the job is completed, but is still running,
>> which is not expected.
>> >>
>> >> Any configurations have to be added to exit both JobManager and
>> TaskManger?
>> >>
>> >> Versions:
>> >>
>> >> Flink - 1.11.0
>> >>
>> >> Java - 1.8
>> >>
>> >>
>> >> --
>> >> A.Narasimha Swamy
>> >>
>> >
>> >
>> > --
>> > A.Narasimha Swamy
>> >
>>
>

-- 
A.Narasimha Swamy

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

Posted by Till Rohrmann <tr...@apache.org>.
Hi Narasimha,

if you are deploying the Flink cluster manually on K8s then there is
no automatic way of stopping the TaskExecutor/TaskManager pods. This is
something you have to do manually (similar to a standalone deployment). The
only clean up mechanism is the automatic termination of the TaskManager
processes if they cannot connect to the ResourceManager after the specified
timeout. However, you can use Flink's native K8s integration with which you
can also deploy a per-job mode cluster [1]. The native K8s integration is
able to clean up the whole cluster.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html

Cheers,
Till

On Thu, Aug 13, 2020 at 11:26 AM Kostas Kloudas <kk...@gmail.com> wrote:

> Hi Narasimha,
>
> I am not sure why the TMs are not shutting down, as Yun said, so I am
> cc'ing Till here as he may be able to shed some light.
> For the application mode, the page in the documentation that you
> pointed is the recommended way to deploy an application in application
> mode.
>
> Cheers,
> Kostas
>
> On Mon, Aug 10, 2020 at 11:16 AM narasimha <sw...@gmail.com> wrote:
> >
> > Thanks, Yun for the prompt reply.
> >
> > TaskManager was actively looking for ResourceManager, on timeout of 5
> mins it got terminated.
> >
> > Any recommendations around this? Or is this the way this will work.
> >
> > What should be done around this to make the application start in
> application deployment mode?
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#start-flink-application
> >
> > Here it has shown to invoke Flink binary to start. Is this the preferred
> way?
> >
> >
> > On Mon, Aug 10, 2020 at 1:46 PM Yun Tang <my...@live.com> wrote:
> >>
> >> Hi
> >>
> >> From your description, the task managers are still alive even the job
> is finished and job manager has shut down?
> >> If so, I think this is really weird, could you check what the TM is
> doing via jstack and the logs in job manager and idle task manager?
> >> The task manager should be released when the JM is shutting down.
> >> Moreover, idle task manager would also release after 30 seconds by
> default [1].
> >>
> >>
> >> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#resourcemanager-taskmanager-timeout
> >>
> >> Best
> >> Yun Tang
> >>
> >>
> >> ________________________________
> >> From: narasimha <sw...@gmail.com>
> >> Sent: Monday, August 10, 2020 15:36
> >> To: user@flink.apache.org <us...@flink.apache.org>
> >> Subject: TaskManagers are still up even after job execution completed
> in PerJob deployment mode
> >>
> >>
> >> I'm trying out Flink Per-Job deployment using docker-compose.
> >>
> >> Configurations:
> >>
> >> version: "2.2"
> >> jobs:
> >>   jobmanager:
> >>     build: ./
> >>     image: flink_local:1.1
> >>     ports:
> >>       - "8081:8081"
> >>     command: standalone-job --job-classname com.organization.BatchJob
> >>     environment:
> >>       - |
> >>         FLINK_PROPERTIES=
> >>         jobmanager.rpc.address: jobmanager
> >>         parallelism.default: 2
> >>   taskmanager:
> >>     image: flink_local:1.1
> >>     depends_on:
> >>       - jobmanager
> >>     command: taskmanager
> >>     scale: 1
> >>     environment:
> >>       - |
> >>         FLINK_PROPERTIES=
> >>         jobmanager.rpc.address: jobmanager
> >>         taskmanager.numberOfTaskSlots: 2
> >>         parallelism.default: 2
> >>
> >> Flink image is extended with job.jar, Job executed successfully.
> >>
> >> JobManager exited after the job is completed, but is still running,
> which is not expected.
> >>
> >> Any configurations have to be added to exit both JobManager and
> TaskManger?
> >>
> >> Versions:
> >>
> >> Flink - 1.11.0
> >>
> >> Java - 1.8
> >>
> >>
> >> --
> >> A.Narasimha Swamy
> >>
> >
> >
> > --
> > A.Narasimha Swamy
> >
>

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

Posted by Kostas Kloudas <kk...@gmail.com>.
Hi Narasimha,

I am not sure why the TMs are not shutting down, as Yun said, so I am
cc'ing Till here as he may be able to shed some light.
For the application mode, the page in the documentation that you
pointed is the recommended way to deploy an application in application
mode.

Cheers,
Kostas

On Mon, Aug 10, 2020 at 11:16 AM narasimha <sw...@gmail.com> wrote:
>
> Thanks, Yun for the prompt reply.
>
> TaskManager was actively looking for ResourceManager, on timeout of 5 mins it got terminated.
>
> Any recommendations around this? Or is this the way this will work.
>
> What should be done around this to make the application start in application deployment mode?
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#start-flink-application
>
> Here it has shown to invoke Flink binary to start. Is this the preferred way?
>
>
> On Mon, Aug 10, 2020 at 1:46 PM Yun Tang <my...@live.com> wrote:
>>
>> Hi
>>
>> From your description, the task managers are still alive even the job is finished and job manager has shut down?
>> If so, I think this is really weird, could you check what the TM is doing via jstack and the logs in job manager and idle task manager?
>> The task manager should be released when the JM is shutting down.
>> Moreover, idle task manager would also release after 30 seconds by default [1].
>>
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#resourcemanager-taskmanager-timeout
>>
>> Best
>> Yun Tang
>>
>>
>> ________________________________
>> From: narasimha <sw...@gmail.com>
>> Sent: Monday, August 10, 2020 15:36
>> To: user@flink.apache.org <us...@flink.apache.org>
>> Subject: TaskManagers are still up even after job execution completed in PerJob deployment mode
>>
>>
>> I'm trying out Flink Per-Job deployment using docker-compose.
>>
>> Configurations:
>>
>> version: "2.2"
>> jobs:
>>   jobmanager:
>>     build: ./
>>     image: flink_local:1.1
>>     ports:
>>       - "8081:8081"
>>     command: standalone-job --job-classname com.organization.BatchJob
>>     environment:
>>       - |
>>         FLINK_PROPERTIES=
>>         jobmanager.rpc.address: jobmanager
>>         parallelism.default: 2
>>   taskmanager:
>>     image: flink_local:1.1
>>     depends_on:
>>       - jobmanager
>>     command: taskmanager
>>     scale: 1
>>     environment:
>>       - |
>>         FLINK_PROPERTIES=
>>         jobmanager.rpc.address: jobmanager
>>         taskmanager.numberOfTaskSlots: 2
>>         parallelism.default: 2
>>
>> Flink image is extended with job.jar, Job executed successfully.
>>
>> JobManager exited after the job is completed, but is still running, which is not expected.
>>
>> Any configurations have to be added to exit both JobManager and TaskManger?
>>
>> Versions:
>>
>> Flink - 1.11.0
>>
>> Java - 1.8
>>
>>
>> --
>> A.Narasimha Swamy
>>
>
>
> --
> A.Narasimha Swamy
>

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

Posted by narasimha <sw...@gmail.com>.
Thanks, Yun for the prompt reply.

TaskManager was actively looking for ResourceManager, on timeout of 5 mins
it got terminated.

Any recommendations around this? Or is this the way this will work.

What should be done around this to make the application start in
application deployment mode?

https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#start-flink-application

Here it has shown to invoke Flink binary to start. Is this the preferred
way?


On Mon, Aug 10, 2020 at 1:46 PM Yun Tang <my...@live.com> wrote:

> Hi
>
> From your description, the task managers are still alive even the job is
> finished and job manager has shut down?
> If so, I think this is really weird, could you check what the TM is doing
> via jstack and the logs in job manager and idle task manager?
> The task manager should be released when the JM is shutting down.
> Moreover, idle task manager would also release after 30 seconds by default
> [1].
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#resourcemanager-taskmanager-timeout
>
> Best
> Yun Tang
>
>
> ------------------------------
> *From:* narasimha <sw...@gmail.com>
> *Sent:* Monday, August 10, 2020 15:36
> *To:* user@flink.apache.org <us...@flink.apache.org>
> *Subject:* TaskManagers are still up even after job execution completed
> in PerJob deployment mode
>
>
> I'm trying out Flink Per-Job deployment using docker-compose.
>
> Configurations:
>
> version: "2.2"
> jobs:
>   jobmanager:
>     build: ./
>     image: flink_local:1.1
>     ports:
>       - "8081:8081"
>     command: standalone-job --job-classname com.organization.BatchJob
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         parallelism.default: 2
>   taskmanager:
>     image: flink_local:1.1
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
>         parallelism.default: 2
>
> Flink image is extended with job.jar, Job executed successfully.
>
> JobManager exited after the job is completed, but is still running, which
> is not expected.
>
> Any configurations have to be added to exit both JobManager and TaskManger
> ?
>
> Versions:
>
> Flink - 1.11.0
>
> Java - 1.8
>
> --
> A.Narasimha Swamy
>
>

-- 
A.Narasimha Swamy

Re: TaskManagers are still up even after job execution completed in PerJob deployment mode

Posted by Yun Tang <my...@live.com>.
Hi

From your description, the task managers are still alive even the job is finished and job manager has shut down?
If so, I think this is really weird, could you check what the TM is doing via jstack and the logs in job manager and idle task manager?
The task manager should be released when the JM is shutting down.
Moreover, idle task manager would also release after 30 seconds by default [1].


[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#resourcemanager-taskmanager-timeout

Best
Yun Tang


________________________________
From: narasimha <sw...@gmail.com>
Sent: Monday, August 10, 2020 15:36
To: user@flink.apache.org <us...@flink.apache.org>
Subject: TaskManagers are still up even after job execution completed in PerJob deployment mode


I'm trying out Flink Per-Job deployment using docker-compose.

Configurations:

version: "2.2"
jobs:
  jobmanager:
    build: ./
    image: flink_local:1.1
    ports:
      - "8081:8081"
    command: standalone-job --job-classname com.organization.BatchJob
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
        parallelism.default: 2
  taskmanager:
    image: flink_local:1.1
    depends_on:
      - jobmanager
    command: taskmanager
    scale: 1
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
        taskmanager.numberOfTaskSlots: 2
        parallelism.default: 2


Flink image is extended with job.jar, Job executed successfully.

JobManager exited after the job is completed, but is still running, which is not expected.

Any configurations have to be added to exit both JobManager and TaskManger?

Versions:

Flink - 1.11.0

Java - 1.8

--
A.Narasimha Swamy