You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by "JOAQUIM, Kevin" <ke...@sfr.com.INVALID> on 2020/09/03 09:51:51 UTC

KubernetesPodOperator ERROR - Connection broken but pod write log

Hi team,

I have a ERROR in production on my KubernetesPodOperator tasks with log : ERROR - Connection broken.

when my job is running, my pod writes logs visible in the airflow interface. But i don't know why airflow no longer retrieves the logs from my pod and passes the task into zombie. After the time of scheduler_zombie_task_threshold the job is killed. But when I log into my pod, i can see it is writing logs and there is activity. the network and the cluster k8s is ok..

I found this issue: https://issues.apache.org/jira/browse/AIRFLOW-3534 , but my pod writes continuous logs on kubernetes (stdout)

Why Airflow stops collecting my task log KubernetesPodOperator ? And why airflow monitor the status of the task with the log collection ..?

I can set get_logs to False, but that doesn't solve my problem.. I'm using Airflow v1.10.10.

Who can help me? Thanks in advance.

Regards,

Kevin JOAQUIM

Re: KubernetesPodOperator ERROR - Connection broken but pod write log

Posted by Daniel Holleran <da...@lendico.de>.
Hi Kevin,

I'm not sure if it's exactly the same issue, but I recently had a similar
issue where airflow stopped reading logs from certain pods and thought they
were still running when in fact they were completed. It only affected pods
that have gaps of a few minutes between writing a log line.

I was able to fix it by using the code in this pull request:
https://github.com/apache/airflow/pull/7428 to make a custom kubernetes pod
operator and pod launcher, so that it polls the pod instead

In this case, the issue is not with the airflow code itself, but rather the
python kubernetes client does not work as expected

Hope this helps

Regards,
Daniel

On Fri, 18 Sep 2020 at 14:02, JOAQUIM, Kevin <ke...@sfr.com.invalid>
wrote:

> Hi team,
>
> No one to help me.. ?
>
> I change get_logs to false but it's not the solution..
>
> Kevin
>
> De : JOAQUIM, Kevin
> Envoyé : jeudi 3 septembre 2020 11:52
> À : users@airflow.apache.org; dev@airflow.apache.org; !Admin-ordo <
> adminordo@sfr.com>; LEVESQUE, Eric <er...@sfr.com>
> Objet : KubernetesPodOperator ERROR - Connection broken but pod write log
>
> Hi team,
>
> I have a ERROR in production on my KubernetesPodOperator tasks with log :
> ERROR - Connection broken.
>
> when my job is running, my pod writes logs visible in the airflow
> interface. But i don't know why airflow no longer retrieves the logs from
> my pod and passes the task into zombie. After the time of
> scheduler_zombie_task_threshold the job is killed. But when I log into my
> pod, i can see it is writing logs and there is activity. the network and
> the cluster k8s is ok..
>
> I found this issue: https://issues.apache.org/jira/browse/AIRFLOW-3534 ,
> but my pod writes continuous logs on kubernetes (stdout)
>
> Why Airflow stops collecting my task log KubernetesPodOperator ? And why
> airflow monitor the status of the task with the log collection ..?
>
> I can set get_logs to False, but that doesn't solve my problem.. I'm using
> Airflow v1.10.10.
>
> Who can help me? Thanks in advance.
>
> Regards,
>
> Kevin JOAQUIM
>


-- 
Daniel Holleran
Senior Data Engineer

Lendico Deutschland GmbH  |  c/o Techspace |  Lobeckstraße 36-40  |  10969
Berlin
<https://maps.google.com/?q=Lobeckstra%C3%9Fe+36-40%C2%A0+%7C%C2%A0+10969+Berlin&entry=gmail&source=g>
Geschäftsführer: Sven Foos (Vors.), Thomas Becher, Verena Freyer, Friedrich
Hubel, Martin Kohlbeck, Florian Strobel
Eingetragen beim Amtsgericht Charlottenburg,  HRB 140644B

***
Die in dieser E-Mail enthaltenen Informationen können von Lendico
Deutschland GmbH für eigene Zwecke verwendet werden. Unsere Hinweise zum
Datenschutz finden Sie hier: https://www.lendico.de/datenschutz. The
information contained in this email can be used by Lendico Deutschland GmbH
for its own purposes. To read our Privacy Policy, please click here:
https://www.lendico.de/datenschutz
CONFIDENTIALITY NOTICE: This message (including any attachments) is
confidential and may be privileged. It may be read, copied and used only
by the intended recipient. If you have received it in error please contact
the sender (by return e-mail) immediately and delete this message.
Any unauthorized use or dissemination of this message in whole or in parts
is strictly prohibited.

RE: KubernetesPodOperator ERROR - Connection broken but pod write log

Posted by "JOAQUIM, Kevin" <ke...@sfr.com>.
Hi team,

No one to help me.. ?

I change get_logs to false but it's not the solution..

Kevin

De : JOAQUIM, Kevin
Envoyé : jeudi 3 septembre 2020 11:52
À : users@airflow.apache.org; dev@airflow.apache.org; !Admin-ordo <ad...@sfr.com>; LEVESQUE, Eric <er...@sfr.com>
Objet : KubernetesPodOperator ERROR - Connection broken but pod write log

Hi team,

I have a ERROR in production on my KubernetesPodOperator tasks with log : ERROR - Connection broken.

when my job is running, my pod writes logs visible in the airflow interface. But i don't know why airflow no longer retrieves the logs from my pod and passes the task into zombie. After the time of scheduler_zombie_task_threshold the job is killed. But when I log into my pod, i can see it is writing logs and there is activity. the network and the cluster k8s is ok..

I found this issue: https://issues.apache.org/jira/browse/AIRFLOW-3534 , but my pod writes continuous logs on kubernetes (stdout)

Why Airflow stops collecting my task log KubernetesPodOperator ? And why airflow monitor the status of the task with the log collection ..?

I can set get_logs to False, but that doesn't solve my problem.. I'm using Airflow v1.10.10.

Who can help me? Thanks in advance.

Regards,

Kevin JOAQUIM

RE: KubernetesPodOperator ERROR - Connection broken but pod write log

Posted by "JOAQUIM, Kevin" <ke...@sfr.com.INVALID>.
Hi team,

No one to help me.. ?

I change get_logs to false but it's not the solution..

Kevin

De : JOAQUIM, Kevin
Envoyé : jeudi 3 septembre 2020 11:52
À : users@airflow.apache.org; dev@airflow.apache.org; !Admin-ordo <ad...@sfr.com>; LEVESQUE, Eric <er...@sfr.com>
Objet : KubernetesPodOperator ERROR - Connection broken but pod write log

Hi team,

I have a ERROR in production on my KubernetesPodOperator tasks with log : ERROR - Connection broken.

when my job is running, my pod writes logs visible in the airflow interface. But i don't know why airflow no longer retrieves the logs from my pod and passes the task into zombie. After the time of scheduler_zombie_task_threshold the job is killed. But when I log into my pod, i can see it is writing logs and there is activity. the network and the cluster k8s is ok..

I found this issue: https://issues.apache.org/jira/browse/AIRFLOW-3534 , but my pod writes continuous logs on kubernetes (stdout)

Why Airflow stops collecting my task log KubernetesPodOperator ? And why airflow monitor the status of the task with the log collection ..?

I can set get_logs to False, but that doesn't solve my problem.. I'm using Airflow v1.10.10.

Who can help me? Thanks in advance.

Regards,

Kevin JOAQUIM