You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "RNHTTR (via GitHub)" <gi...@apache.org> on 2023/03/08 20:07:11 UTC

[GitHub] [airflow] RNHTTR opened a new issue, #29985: Re-opening: In-flight DAG Import Errors Cause Tasks to Fail without Logs

RNHTTR opened a new issue, #29985:
URL: https://github.com/apache/airflow/issues/29985

   ### Apache Airflow version
   
   2.5.1
   
   ### What happened
   
   Today, if a DAG is running, and it has a top-level Variable.get() command, and that Variable is deleted while the DAG is still running, the DAG’s next task(s) will fail with no task logs (assuming the Variable eventually gets re-introduced), because the DAG cannot be parsed.
   
   I suspect this would happen with anything that can cause an import error, but I’m able to easily reproduce it using Variables.
   
   ### What you think should happen instead
   
   Once the DAG is able to be parsed/imported, any tasks that failed due to this problem should say something along the lines of "Task timed out due to a DAG import or parse error."
   
   ### How to reproduce
   
   
   1. Create an Airflow Variable with the key "some_variable"
   2. Trigger a DAG that makes a top-level call to Variable.get("some_variable"). While the DAG is running, delete the Airflow Variable.
   3. Wait 10-15 minutes or so
   4. Recreate an Airflow Variable with the key "some_variable"
   
   
   ### Operating System
   
   n/a
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   Re-opening #29984 after discussion regarding improved logging capabilities making it possible to surface this information to DAG authors rather than administrators.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29985: Re-opening: In-flight DAG Import Errors Cause Tasks to Fail without Logs

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29985:
URL: https://github.com/apache/airflow/issues/29985#issuecomment-1463763707

   Yeah. I though tthe same @eladkal, but after the discussion, I agree It will be **quite some** improvement of the effectively "dag versioning" issue but a very specific combination of those three things that must happen:
   
   1) parsing the dag in DAGFileProcessor produced the right structure of serialized DAG and scheduler already scheduled the task
   2) but parsing the DAG in worker fails to produce the actual task "code" to run for this task
   3) and when it is either an intermittent issue (like connectivity problem) or an issue where parsing DAGs on worker produces different result than parsing them in DAGFileProcessor (which might be beause the DAG author made wrong assumptions).
   
   If all the three things happen (which is not that unlikely in real environment) - then printing the error details to the task log (which worker can write the logs to at the moment parsing DAG fails during "airflow task" command) and it might then be seen by those who operate airflow rather than by cluster admins. 
   
   Note, The 3rd point above is important - because if the issue is not intermittent, then the task (and the log) will not even be visible in the UI to take a look at. This is why I initially in https://github.com/apache/airflow/discussions/29984 qualified it in the as "versioning" issue (because if the task disappears we currently have no way to see the log anyway) - but I agree that if the issue is intermittent, or if the worker consistently fails to parse the DAG while the DAGFileProcessor parses it all ok, thn it might help in diagnosing the issue a lot.
   
   This might be quite an improvement vs. the current approach where the errors are just printed out in the worker stdout and visible only to cluster admin people.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29985: Re-opening: In-flight DAG Import Errors Cause Tasks to Fail without Logs

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29985:
URL: https://github.com/apache/airflow/issues/29985#issuecomment-1463772332

   Also one more comment: having this kind of issue is generally an indication of not following the best practices - because DAG parsing SHOULD NOT reach out so in generally DAG  parsing should not really lead to intermittent issues, but I think we have to accept the fact that some people do that, and helping them to investigate the issue they have is not something we should necessarily block.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #29985: Re-opening: In-flight DAG Import Errors Cause Tasks to Fail without Logs

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #29985:
URL: https://github.com/apache/airflow/issues/29985#issuecomment-1460874078

   If the dag is broken why do you expect task log to appear? the dag is broken so nothing can work...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] collinmcnulty commented on issue #29985: Re-opening: In-flight DAG Import Errors Cause Tasks to Fail without Logs

Posted by "collinmcnulty (via GitHub)" <gi...@apache.org>.
collinmcnulty commented on issue #29985:
URL: https://github.com/apache/airflow/issues/29985#issuecomment-1464170166

   Agree completely that this is an indication of not following best practices. I think this is a great opportunity to demonstrate to users why those best practices exist by giving them a clear error that tells them that failing to follow that best practice was the cause of the task failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] SamWheating commented on issue #29985: Re-opening: In-flight DAG Import Errors Cause Tasks to Fail without Logs

Posted by "SamWheating (via GitHub)" <gi...@apache.org>.
SamWheating commented on issue #29985:
URL: https://github.com/apache/airflow/issues/29985#issuecomment-1468787613

   I am interested in this issue as this affects a lot of our users - I'll see what I can do to recreate this issue and surface some logs in the event of a parsing failure within the `airflow task run` command.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] RNHTTR commented on issue #29985: Re-opening: In-flight DAG Import Errors Cause Tasks to Fail without Logs

Posted by "RNHTTR (via GitHub)" <gi...@apache.org>.
RNHTTR commented on issue #29985:
URL: https://github.com/apache/airflow/issues/29985#issuecomment-1460886968

   From the reproduction:
   
   
   1. Create an Airflow Variable with the key "some_variable"
   2. Trigger a DAG that makes a top-level call to Variable.get("some_variable"). While the DAG is running, delete the Airflow Variable.
   3. Wait 10-15 minutes or so
   4. Recreate an Airflow Variable with the key "some_variable"
   
   Recreating the Variable will fix the DAG, but any tasks that were attempted between (2) and (4) will have failed, and upon the DAG being fixed, the tasks will have failed with no error message in the UI. This will require an administrator to have access to worker logs; it'd be better if these logs could be surfaced to DAG authors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29985: Re-opening: In-flight DAG Import Errors Cause Tasks to Fail without Logs

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29985:
URL: https://github.com/apache/airflow/issues/29985#issuecomment-1464188491

   > Agree completely that this is an indication of not following best practices. I think this is a great opportunity to demonstrate to users why those best practices exist by giving them a clear error that tells them that failing to follow that best practice was the cause of the task failure.
   
   My thought exactly :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org