You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/01 09:42:33 UTC

[GitHub] [airflow] noose opened a new issue, #25438: Dags stucks on the "running" state

noose opened a new issue, #25438:
URL: https://github.com/apache/airflow/issues/25438

   ### Apache Airflow version
   
   2.3.3 (latest released)
   
   ### What happened
   
   To be honest I don't know why it stopped working properly.
   In our process we have 2 DAGs per client, first DAG have 3 tasks, second one have 5-8 tasks. In general first DAG should take ~3min, second one ~5-10min to finish. A week ago we've added 2 new clients with similar amount of data than previous customers and airflow started to behave strangely. 
   Dags (different ones, not only for those 2 customers) are in the `running` state for hours (but all tasks inside are already finished after few minutes after start, but worker is doing "something" which is not in the logs and causing high load ~12, when in normal conditions we have < 1). Or dag is in `running` state, task can have `queued` (or `no_status`) status for hours.
   We've mitigated that issue with restarting workers and schedulers on every hour, but it's not a longterm or midterm solution. 
   
   We're using CeleryExecutors (in the kubernetes - 1 pod = 1 worker). It's not helping if we change concurrency from 4 to 1 for example. On the worker pod on the process list we have only celery, gunicorn and current task.
   
   We had `apache/airflow:2.2.5-python3.8` but right now it's `apache/airflow:2.3.3-python3.8` with the same problems.
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye) (on pods), amazon-linux on EKS
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-amazon==4.0.0
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-cncf-kubernetes==4.1.0
   apache-airflow-providers-docker==3.0.0
   apache-airflow-providers-elasticsearch==4.0.0
   apache-airflow-providers-ftp==3.0.0
   apache-airflow-providers-google==8.1.0
   apache-airflow-providers-grpc==3.0.0
   apache-airflow-providers-hashicorp==3.0.0
   apache-airflow-providers-http==3.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-microsoft-azure==4.0.0
   apache-airflow-providers-mysql==3.0.0
   apache-airflow-providers-odbc==3.0.0
   apache-airflow-providers-postgres==5.0.0
   apache-airflow-providers-redis==3.0.0
   apache-airflow-providers-sendgrid==3.0.0
   apache-airflow-providers-sftp==3.0.0
   apache-airflow-providers-slack==5.0.0
   apache-airflow-providers-sqlite==3.0.0
   apache-airflow-providers-ssh==3.0.0
   ```
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   Airflow-scheduler, web, workers, redis are on our EKS, deployed via our own helm charts.
   
   We have also RDS (postgresql).
   
   ### Anything else
   
   ```bash
     ____________       _____________
    ____    |__( )_________  __/__  /________      __
   ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
   ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
    _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   [2022-08-01 08:42:48,042] {{scheduler_job.py:708}} INFO - Starting the scheduler
   [2022-08-01 08:42:48,042] {{scheduler_job.py:713}} INFO - Processing each file at most -1 times
   [2022-08-01 08:42:48,238] {{executor_loader.py:105}} INFO - Loaded executor: CeleryExecutor
   [2022-08-01 08:42:48,243] {{manager.py:160}} INFO - Launched DagFileProcessorManager with pid: 29
   [2022-08-01 08:42:48,245] {{scheduler_job.py:1233}} INFO - Resetting orphaned tasks for active dag runs
   [2022-08-01 08:42:48,247] {{settings.py:55}} INFO - Configured default timezone Timezone('Europe/Berlin')
   /home/airflow/.local/lib/python3.8/site-packages/airflow/utils/log/file_task_handler.py:52 DeprecationWarning: Passing filename_template to FileTaskHandler is deprecated and has no effect
   [2022-08-01 08:42:48,330] {{celery_executor.py:532}} INFO - Adopted the following 1 tasks from a dead executor
           <TaskInstance: uploads_customer_xxx_v5.calculation_1 custom__2022-07-30 17:13:39+00:00_1 [queued]> in state STARTED
   ```
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #25438: Dags stucks on the "running" state

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25438: Dags stucks on the "running" state
URL: https://github.com/apache/airflow/issues/25438


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #25438: Dags stucks on the "running" state

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #25438:
URL: https://github.com/apache/airflow/issues/25438#issuecomment-1200966839

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] em-eman commented on issue #25438: Dags stucks on the "running" state

Posted by GitBox <gi...@apache.org>.
em-eman commented on issue #25438:
URL: https://github.com/apache/airflow/issues/25438#issuecomment-1201005449

   Just to add more insights for current issue which is created by my colleague :
   
   There are two types of dags where dag one act as a internal scheduler which runs with 1 min heartbeat to check for files in s3 once the file is arrived in s3 then it will trigger the second dag to do data processing and other types of pythpn operator in our pipeline .
   Dag One  (scheduler) -> to check file in s3 and trigger dag two 
   <img width="879" alt="Screenshot 2022-08-01 at 12 11 24" src="https://user-images.githubusercontent.com/56021073/182126369-e7250b91-ede8-4af6-8158-b4a01469d887.png">
   Dag Two (upload dag)
   <img width="879" alt="Screenshot 2022-08-01 at 12 15 10" src="https://user-images.githubusercontent.com/56021073/182126988-76edf23b-3784-4427-95c1-d1b2711691c4.png">
   
   
   Problem happens in second dag where dag is stuck on success state for any of the above task while in flower (celery worker) task is active/running state forever. Seems like issue is between scheduler to worker communication for task state


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org