You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/19 14:02:54 UTC

[GitHub] [airflow] csp98 opened a new issue, #27140: Invalid livenessProbe for Standalone DAG Processor

csp98 opened a new issue, #27140:
URL: https://github.com/apache/airflow/issues/27140

   ### Official Helm Chart version
   
   1.7.0 (latest released)
   
   ### Apache Airflow version
   
   2.3.4
   
   ### Kubernetes Version
   
   1.22.12-gke.1200	
   
   ### Helm Chart configuration
   
   ```yaml
     dagProcessor:
       enabled: true
   ```
   
   ### Docker Image customisations
   
   ```dockerfile
   FROM apache/airflow:2.3.4-python3.9
   
   USER root
   RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
   RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
   RUN apt-get update && apt-get install -y google-cloud-cli
   RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
   RUN sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
   USER airflow
   ```
   
   ### What happened
   
   Current DAG Processor livenessProbe is the following:
   ```
   CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
       airflow jobs check --hostname $(hostname)
   ```
   This command checks the metadata DB searching for an active job whose hostname is the current pod's one (_airflow-dag-processor-xxxx_). 
   However, after running the dag-processor pod for more than 1 hour, there are no jobs with the processor hostname in the jobs table.
   ![image](https://user-images.githubusercontent.com/28935464/196711859-98dadb8f-3273-42ec-a4db-958890db34b7.png)
   ![image](https://user-images.githubusercontent.com/28935464/196711947-5a0fc5d7-4b91-4e82-9ff0-c721e6a4c1cd.png)
   
   As a consequence, the livenessProbe fails and the pod is constantly restarting.
   
   After investigating the code, I found out that DagFileProcessorManager nor DagFileProcessor are creating jobs in the metadata DB, so the livenessProbe is not valid.
   
   ### What you think should happen instead
   
   livenessProbe should be refactored so that it truly checks that DAG Processors are up and running
   
   ### How to reproduce
   
   1. Deploy airflow with a standalone dag-processor.
   2. Wait for ~ 5 minutes
   3. Check that the livenessProbe has been failed for 5 minutes and the pod has been restarted.
   
   ### Anything else
   
   I think this behavior is inherited from the NOT standalone dag-processor mode (the livenessProbe checks for a SchedulerJob, that in fact contains the "DagProcessorJob")
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1296533644

   @george-zubrienko  - feel free to make PR for that - then you will not have to wait and become one of the > 2200 contributors. It's easy!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] george-zubrienko commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
george-zubrienko commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1285389606

   and btw, documentation is missing the liveness probe command entry. Found it is overridable in the chart source code :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1291216927

   Assigned it for 1.8.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] george-zubrienko commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
george-zubrienko commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1305486356

   @potiuk thanks for the offer and I'd love to when I am a bit more free. I won't feel great if I commit to doing a PR and then go afk for couple weeks because there are issues in other projects (also OS ones) that need to be addressed. So not for this issue, but for future ones, I'll consider for sure :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] farhan0syakir commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
farhan0syakir commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1375227467

   I think this is the correct approach here #28799. Now the airflow dag-proccesor command will trigger job and the job will be in table job.
   I'm a newbie so I might introduce another issue, please review it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #27140: Invalid livenessProbe for Standalone DAG Processor
URL: https://github.com/apache/airflow/issues/27140


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] csp33 commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
csp33 commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1365681429

   Hi @farhan0syakir 
   I think removing _hostname_ works because the command just looks if there are jobs in the Metadata DB, no matter which kind of job they are (schedulerJob, triggererJob, etc.) .
   However, that makes the livenessProbe useless, as it's not checking that the dagProcessor is working.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] george-zubrienko commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
george-zubrienko commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1285403881

   In our case we would see this in Datadog alerts. It is better than a non-working liveness probe that in the end goes in crashloopback and you get delays in dag processing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] farhan0syakir commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
farhan0syakir commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1375271058

   ![Screenshot 2023-01-09 103904](https://user-images.githubusercontent.com/10477597/211268309-42e0b97b-433f-49a7-89c9-d9443b570000.png)
   Here is the proof image from my local


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] farhan0syakir commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
farhan0syakir commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1365291154

   I have tested that removing --hostname resolves the issue.
   
   I wonder if my approach is good then I will add unittest.
   
   Please review and let me know if there are any further changes needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] csp98 commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
csp98 commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1285394886

   Yeah @george-zubrienko, however, disabling a livenessProbe is something dangerous: the pod wouldn't be restarted if the processor freezes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] george-zubrienko commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
george-zubrienko commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1285417323

   Yeah but this feature is long due to be fair. No problem re-enabling the probe when it works - crossing fingers this issue gets attention :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] farhan0syakir commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
farhan0syakir commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1365957300

   Thanks for pointing that out @csp33 
   I closed the pull request, I might try it another approach later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] csp98 commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
csp98 commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1285407355

   our fix for the moment is to use the non-standalone dag-processor ;)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] george-zubrienko commented on issue #27140: Invalid livenessProbe for Standalone DAG Processor

Posted by GitBox <gi...@apache.org>.
george-zubrienko commented on issue #27140:
URL: https://github.com/apache/airflow/issues/27140#issuecomment-1285388914

   A temporary workaround for this is to set liveness probe command to `exit 0`.
   
   in terraform `helm_release`:
   ```
   ...
     set {
       name  = "dagProcessor.livenessProbe.command[0]"
       value = "sh"
     }
     set {
       name  = "dagProcessor.livenessProbe.command[1]"
       value = "-c"
     }
     set {
       name  = "dagProcessor.livenessProbe.command[2]"
       value = "exit 0"
     }
   ```
   or in helm deploy:
   ```
   --set dagProcessor.livenessProbe.command[0] ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org