You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "potiuk (via GitHub)" <gi...@apache.org> on 2023/03/10 12:54:12 UTC

[GitHub] [airflow] potiuk commented on issue #29985: Re-opening: In-flight DAG Import Errors Cause Tasks to Fail without Logs

potiuk commented on issue #29985:
URL: https://github.com/apache/airflow/issues/29985#issuecomment-1463763707

   Yeah. I though tthe same @eladkal, but after the discussion, I agree It will be **quite some** improvement of the effectively "dag versioning" issue but a very specific combination of those three things that must happen:
   
   1) parsing the dag in DAGFileProcessor produced the right structure of serialized DAG and scheduler already scheduled the task
   2) but parsing the DAG in worker fails to produce the actual task "code" to run for this task
   3) and when it is either an intermittent issue (like connectivity problem) or an issue where parsing DAGs on worker produces different result than parsing them in DAGFileProcessor (which might be beause the DAG author made wrong assumptions).
   
   If all the three things happen (which is not that unlikely in real environment) - then printing the error details to the task log (which worker can write the logs to at the moment parsing DAG fails during "airflow task" command) and it might then be seen by those who operate airflow rather than by cluster admins. 
   
   Note, The 3rd point above is important - because if the issue is not intermittent, then the task (and the log) will not even be visible in the UI to take a look at. This is why I initially in https://github.com/apache/airflow/discussions/29984 qualified it in the as "versioning" issue (because if the task disappears we currently have no way to see the log anyway) - but I agree that if the issue is intermittent, or if the worker consistently fails to parse the DAG while the DAGFileProcessor parses it all ok, thn it might help in diagnosing the issue a lot.
   
   This might be quite an improvement vs. the current approach where the errors are just printed out in the worker stdout and visible only to cluster admin people.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org