You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "G.S.Vijay Raajaa" <gs...@gmail.com> on 2017/07/06 19:43:27 UTC

Flink Jobs disappers

HI,

I am using Flink Task manager and Job Manager as docker containers.
Strangely, I find the jobs to disappear from the web portal after some
time. The jobs don't move to the failed state either. Any pointers will be
really helpful. Not able to get a clue from the logs.

Kindly let me know if I need specific tuning and ways to persists the
uploaded jars.

Regards,
Vijay Raajaa G S

Re: Flink Jobs disappers

Posted by Joshua Griffith <JG...@CampusLabs.com>.
Are your containers on separate nodes? Are you running in Kubernetes? Have you set hard resource limits?

When I’ve run into this issue it’s been because the JobManager was restarted (I wasn’t running in HA mode). Your node could have been restarted or Docker could have OOM-killed the process if the machine was low on memory. You might want to `docker ps` to see if your containers are restarting. Exit code 137 probably means that they were OOM-killed.

I wouldn’t run the JobManager on the same node as TaskManagers unless you’re using hard resource limits. Note: if you decide to go the hard resource limit route, know that Docker OOM-kills based on VIRT, not RSS (watch out for mmap).

> On Jul 8, 2017, at 1:54 AM, Chesnay Schepler <ch...@apache.org> wrote:
> 
> If a TaskManager ran out of memory there should be something in the JobManager logs about a unreachable TaskManager.
> That said, there should also be something in the JobManager logs about the job disappearing...
> 
> Could you set the logging level to DEBUG, run the job again, and provide us (or me directly) with the logs?
> 
> Regards,
> Chesnay
> 
> On 08.07.2017 08:44, G.S.Vijay Raajaa wrote:
>> HI Chesnay,
>> 
>> 
>> I am currently using Flink - 1.3 using docker containers. I am not using it in HA mode. I have 3 task managers and one job manager. This happens randomly and not every time. Does it mean the task manager ran out of memory etc? I am using slots more than the available core , I hope compute is shared in round robin. Any pointers to tuning and HA setup will be greatly appreciated.
>> 
>> Regards,
>> Vijay Raajaa GS
>> 
>> On Sat, Jul 8, 2017 at 12:04 PM, Chesnay Schepler <chesnay@apache.org <ma...@apache.org>> wrote:
>> Hello,
>> 
>> could you tell us a bit more about your setup? Which Flink version you're using, whether HA is enabled, does this happen every time etc. .
>> Regards,
>> Chesnay
>> 
>> 
>> On 06.07.2017 21:43, G.S.Vijay Raajaa wrote:
>> HI,
>> 
>> I am using Flink Task manager and Job Manager as docker containers. Strangely, I find the jobs to disappear from the web portal after some time. The jobs don't move to the failed state either. Any pointers will be really helpful. Not able to get a clue from the logs.
>> 
>> Kindly let me know if I need specific tuning and ways to persists the uploaded jars.
>> 
>> Regards,
>> Vijay Raajaa G S
>> 
>> 
>> 
> 


Re: Flink Jobs disappers

Posted by Chesnay Schepler <ch...@apache.org>.
If a TaskManager ran out of memory there should be something in the 
JobManager logs about a unreachable TaskManager.
That said, there should also be something in the JobManager logs about 
the job disappearing...

Could you set the logging level to DEBUG, run the job again, and provide 
us (or me directly) with the logs?

Regards,
Chesnay

On 08.07.2017 08:44, G.S.Vijay Raajaa wrote:
> HI Chesnay,
>
>
> I am currently using Flink - 1.3 using docker containers. I am not 
> using it in HA mode. I have 3 task managers and one job manager. This 
> happens randomly and not every time. Does it mean the task manager ran 
> out of memory etc? I am using slots more than the available core , I 
> hope compute is shared in round robin. Any pointers to tuning and HA 
> setup will be greatly appreciated.
>
> Regards,
> Vijay Raajaa GS
>
> On Sat, Jul 8, 2017 at 12:04 PM, Chesnay Schepler <chesnay@apache.org 
> <ma...@apache.org>> wrote:
>
>     Hello,
>
>     could you tell us a bit more about your setup? Which Flink version
>     you're using, whether HA is enabled, does this happen every time
>     etc. .
>     Regards,
>     Chesnay
>
>
>     On 06.07.2017 21:43, G.S.Vijay Raajaa wrote:
>
>         HI,
>
>         I am using Flink Task manager and Job Manager as docker
>         containers. Strangely, I find the jobs to disappear from the
>         web portal after some time. The jobs don't move to the failed
>         state either. Any pointers will be really helpful. Not able to
>         get a clue from the logs.
>
>         Kindly let me know if I need specific tuning and ways to
>         persists the uploaded jars.
>
>         Regards,
>         Vijay Raajaa G S
>
>
>
>


Re: Flink Jobs disappers

Posted by "G.S.Vijay Raajaa" <gs...@gmail.com>.
HI Chesnay,


I am currently using Flink - 1.3 using docker containers. I am not using it
in HA mode. I have 3 task managers and one job manager. This happens
randomly and not every time. Does it mean the task manager ran out of
memory etc? I am using slots more than the available core , I hope compute
is shared in round robin. Any pointers to tuning and HA setup will be
greatly appreciated.

Regards,
Vijay Raajaa GS

On Sat, Jul 8, 2017 at 12:04 PM, Chesnay Schepler <ch...@apache.org>
wrote:

> Hello,
>
> could you tell us a bit more about your setup? Which Flink version you're
> using, whether HA is enabled, does this happen every time etc. .
> Regards,
> Chesnay
>
>
> On 06.07.2017 21:43, G.S.Vijay Raajaa wrote:
>
>> HI,
>>
>> I am using Flink Task manager and Job Manager as docker containers.
>> Strangely, I find the jobs to disappear from the web portal after some
>> time. The jobs don't move to the failed state either. Any pointers will be
>> really helpful. Not able to get a clue from the logs.
>>
>> Kindly let me know if I need specific tuning and ways to persists the
>> uploaded jars.
>>
>> Regards,
>> Vijay Raajaa G S
>>
>
>
>

Re: Flink Jobs disappers

Posted by Chesnay Schepler <ch...@apache.org>.
Hello,

could you tell us a bit more about your setup? Which Flink version 
you're using, whether HA is enabled, does this happen every time etc. .
Regards,
Chesnay

On 06.07.2017 21:43, G.S.Vijay Raajaa wrote:
> HI,
>
> I am using Flink Task manager and Job Manager as docker containers. 
> Strangely, I find the jobs to disappear from the web portal after some 
> time. The jobs don't move to the failed state either. Any pointers 
> will be really helpful. Not able to get a clue from the logs.
>
> Kindly let me know if I need specific tuning and ways to persists the 
> uploaded jars.
>
> Regards,
> Vijay Raajaa G S