You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Steven Miller (Jira)" <ji...@apache.org> on 2019/09/17 16:19:00 UTC

[jira] [Commented] (AIRFLOW-2970) Kubernetes logging is broken

    [ https://issues.apache.org/jira/browse/AIRFLOW-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931615#comment-16931615 ] 

Steven Miller commented on AIRFLOW-2970:
----------------------------------------

Astronomer is debugging a similar issue in 1.10.x . If you try to download the logs from the UI (NOT view the logs - download them by clicking the button with a number on it) and you get a network error, and you check the webserver logs and have this stack trace:
```

[2019-09-17 11:51:11 +0000] [10636] [ERROR] Error handling request

Traceback (most recent call last):

  File "/usr/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 181, in handle_request

    for item in respiter:

  File "/usr/lib/python3.7/site-packages/werkzeug/wsgi.py", line 507, in __next__

    return self._next()

  File "/usr/lib/python3.7/site-packages/werkzeug/wrappers/base_response.py", line 45, in _iter_encoded

    for item in iterable:

  File "/usr/lib/python3.7/site-packages/airflow/www_rbac/views.py", line 600, in _generate_log_stream

    logs, metadata = _get_logs_with_metadata(try_number, metadata)

  File "/usr/lib/python3.7/site-packages/airflow/www_rbac/views.py", line 569, in _get_logs_with_metadata

    logs, metadatas = handler.read(ti, try_number, metadata=metadata)

  File "/usr/lib/python3.7/site-packages/airflow/utils/log/file_task_handler.py", line 164, in read

    log, metadata = self._read(task_instance, try_number, metadata)

  File "/usr/lib/python3.7/site-packages/airflow/utils/log/es_task_handler.py", line 144, in _read

    and offset >= metadata['max_offset']:

TypeError: '>=' not supported between instances of 'str' and 'int'
```

Then it is the same problem we are experiencing. If that is the case, this change is what we are using to patch it while we get to the bottom of what's going on. [https://github.com/astronomer/airflow/pull/63]

> Kubernetes logging is broken
> ----------------------------
>
>                 Key: AIRFLOW-2970
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2970
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executors
>            Reporter: Jon Davies
>            Assignee: Daniel Imberman
>            Priority: Major
>
> I'm using Airflow with the Kubernetes executor and pod operator. And my DAGs are configured to do get_log=True and all my DAGs are set to log to stdout and I can see all the logs in kubectl logs.
> I can see that the scheduler logs things to: $AIRFLOW_HOME/logs/scheduler/2018-08-28/*
> However, this just consists of:
> {code:java}
> [2018-08-28 13:03:27,695] {jobs.py:385} INFO - Started process (PID=16994) to work on /home/airflow/dags/dag.py
> [2018-08-28 13:03:27,697] {jobs.py:1782} INFO - Processing file /home/airflow/dags/dag.py for tasks to queue
> [2018-08-28 13:03:27,697] {logging_mixin.py:95} INFO - [2018-08-28 13:03:27,697] {models.py:258} INFO - Filling up the DagBag from /home/airflow/dags/dag.py
> {code}
> If I quickly exec into the executor the scheduler spins up, I can see that things are properly logged to:
> {code:java}
> /home/airflow/logs/dag$ tail -f dag-downloader/2018-08-28T13\:05\:07.704072+00\:00/1.log
> [2018-08-28 13:05:24,399] {logging_mixin.py:95} INFO - [2018-08-28 13:05:24,399] {pod_launcher.py:112} INFO - Event: dag-downloader-015ca48c had an event of type Pending
> ...
> [2018-08-28 13:05:37,193] {logging_mixin.py:95} INFO - [2018-08-28 13:05:37,193] {pod_launcher.py:95} INFO - b'INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (7): blah-blah.s3.eu-west-1.amazonaws.com\n'
> ...
> ...all other log lines from pod...
> {code}
> However, this executor pod only exists for the duration of the lifetime of the task pod so the logs are lost pretty much immediately after the task runs. There is nothing that ships the logs back to the scheduler and/or web UI.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)