You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/01 15:45:21 UTC

[GitHub] [airflow] kaxil opened a new pull request #18120: Handle DAGs with Parsing Errors

kaxil opened a new pull request #18120:
URL: https://github.com/apache/airflow/pull/18120


   Currently if there are parsing errors on an existing DAG, the DAG still remains in serialized DAG table so shows up in the Webserver and user can trigger it, leading to failures and confusion.
   
   This PR removes the Serialized DAGs if they have encountered parsing errors and deactivates them from `dag` table.
   
   The error already shows up in the UI:
   
   ![image](https://user-images.githubusercontent.com/8811558/132706500-e08c50b6-3286-43e1-b53b-11daea123ebf.png)
   
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-951433187


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916194592


   I think if we fix this then tasks that get stuck would always execute their callbacks. Should we remove this line in this PR:
   https://github.com/apache/airflow/blob/80eb8092d48ef5d14102bb2a11b44ef7cf0bbc95/airflow/jobs/scheduler_job.py#L529-L530
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #18120:
URL: https://github.com/apache/airflow/pull/18120


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916168048


   > This would make it show a 404 (or maybe even a 500 in some cases cos we don't handle dag not found everywhere) -- neither of which feel like a good UX to me.
   
   Partially agree but this case is handled and redirects users to home and shows following on Tree View:
   
   ![image](https://user-images.githubusercontent.com/8811558/132707701-a8b35d18-c00d-4775-ab72-5955ecf6c60b.png)
   
   For Graph it shows the following which can and should be improved:
   
   ```
   Something bad has happened.
   
   Airflow is used by many users, and it is very likely that others had similar problems and you can easily find
   a solution to your problem.
   
   Consider following these steps:
   
     * gather the relevant information (detailed logs with errors, reproduction steps, details of your deployment)
   
     * find similar issues using:
        * GitHub Discussions
        * GitHub Issues
        * Stack Overflow
        * the usual search engine you use on a daily basis
   
     * if you run Airflow on a Managed Service, consider opening an issue using the service support channels
   
     * if you tried and have difficulty with diagnosing and fixing the problem yourself, consider creating a bug report.
       Make sure however, to include all relevant details and results of your investigation so far.
   
   Python version: 3.8.11
   Airflow version: 2.2.0.dev0
   Node: c869393dfa70
   -------------------------------------------------------------------------------
   Traceback (most recent call last):
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
       response = self.full_dispatch_request()
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/usr/local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
       raise value
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
       rv = self.dispatch_request()
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/opt/airflow/airflow/www/auth.py", line 51, in decorated
       return func(*args, **kwargs)
     File "/opt/airflow/airflow/www/decorators.py", line 109, in view_func
       return f(*args, **kwargs)
     File "/opt/airflow/airflow/www/decorators.py", line 72, in wrapper
       return f(*args, **kwargs)
     File "/opt/airflow/airflow/utils/session.py", line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File "/opt/airflow/airflow/www/views.py", line 2386, in graph
       dag = current_app.dag_bag.get_dag(dag_id)
     File "/opt/airflow/airflow/utils/session.py", line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File "/opt/airflow/airflow/models/dagbag.py", line 186, in get_dag
       self._add_dag_from_db(dag_id=dag_id, session=session)
     File "/opt/airflow/airflow/models/dagbag.py", line 258, in _add_dag_from_db
       raise SerializedDagNotFound(f"DAG '{dag_id}' not found in serialized_dag table")
   airflow.exceptions.SerializedDagNotFound: DAG 'Give-wrong-arg' not found in serialized_dag table
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916168048


   > This would make it show a 404 (or maybe even a 500 in some cases cos we don't handle dag not found everywhere) -- neither of which feel like a good UX to me.
   
   Partially agree but this case is handled and redirects users to home and shows following:
   
   ![image](https://user-images.githubusercontent.com/8811558/132707701-a8b35d18-c00d-4775-ab72-5955ecf6c60b.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#discussion_r706449385



##########
File path: airflow/dag_processing/processor.py
##########
@@ -647,3 +648,17 @@ def process_file(
             self.log.exception("Error logging import errors!")
 
         return len(dagbag.dags), len(dagbag.import_errors)
+
+    def delete_serialized_dags_with_parsing_errors(self, session, file_path):
+        from airflow.models.serialized_dag import SerializedDagModel
+
+        self.log.error("Deleting Serialized DAG(s) in %s file, please fix the parsing errors.", file_path)

Review comment:
       Good point, yes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916166287


   This would make it show a 404 (or maybe even a 500 in some cases cos we don't handle dag not found everywhere) -- neither of which feel like a good UX to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916163793


   > Hmmmm, not sure about this -- by removing it it also means you can't view the tree/graph view for it.
   
   You shouldn't be able to --- it might show outdated or wrong info. For example I might have added more tasks or removed a task, while the historical view should show it (thinking of DAG Versioning) but not the current view I feel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916168048


   > This would make it show a 404 (or maybe even a 500 in some cases cos we don't handle dag not found everywhere) -- neither of which feel like a good UX to me.
   
   Partially agree but this case is handled and redirects users to home and shows following on Tree View:
   
   ![image](https://user-images.githubusercontent.com/8811558/132707701-a8b35d18-c00d-4775-ab72-5955ecf6c60b.png)
   
   For Graph it shows the following:
   
   ```
   Something bad has happened.
   
   Airflow is used by many users, and it is very likely that others had similar problems and you can easily find
   a solution to your problem.
   
   Consider following these steps:
   
     * gather the relevant information (detailed logs with errors, reproduction steps, details of your deployment)
   
     * find similar issues using:
        * GitHub Discussions
        * GitHub Issues
        * Stack Overflow
        * the usual search engine you use on a daily basis
   
     * if you run Airflow on a Managed Service, consider opening an issue using the service support channels
   
     * if you tried and have difficulty with diagnosing and fixing the problem yourself, consider creating a bug report.
       Make sure however, to include all relevant details and results of your investigation so far.
   
   Python version: 3.8.11
   Airflow version: 2.2.0.dev0
   Node: c869393dfa70
   -------------------------------------------------------------------------------
   Traceback (most recent call last):
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
       response = self.full_dispatch_request()
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/usr/local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
       raise value
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
       rv = self.dispatch_request()
     File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/opt/airflow/airflow/www/auth.py", line 51, in decorated
       return func(*args, **kwargs)
     File "/opt/airflow/airflow/www/decorators.py", line 109, in view_func
       return f(*args, **kwargs)
     File "/opt/airflow/airflow/www/decorators.py", line 72, in wrapper
       return f(*args, **kwargs)
     File "/opt/airflow/airflow/utils/session.py", line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File "/opt/airflow/airflow/www/views.py", line 2386, in graph
       dag = current_app.dag_bag.get_dag(dag_id)
     File "/opt/airflow/airflow/utils/session.py", line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File "/opt/airflow/airflow/models/dagbag.py", line 186, in get_dag
       self._add_dag_from_db(dag_id=dag_id, session=session)
     File "/opt/airflow/airflow/models/dagbag.py", line 258, in _add_dag_from_db
       raise SerializedDagNotFound(f"DAG '{dag_id}' not found in serialized_dag table")
   airflow.exceptions.SerializedDagNotFound: DAG 'Give-wrong-arg' not found in serialized_dag table
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916163311


   Hmmmm, not sure about this -- by removing it it also means you can't view the tree/graph view for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916184088


   Even "Dag is missing" is confusing/misleading and a nicer UX would at least be to customize that message:
   
   "The file x where dag_id was last seen has a syntax error".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on a change in pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
SamWheating commented on a change in pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#discussion_r705880182



##########
File path: airflow/dag_processing/processor.py
##########
@@ -647,3 +648,17 @@ def process_file(
             self.log.exception("Error logging import errors!")
 
         return len(dagbag.dags), len(dagbag.import_errors)
+
+    def delete_serialized_dags_with_parsing_errors(self, session, file_path):
+        from airflow.models.serialized_dag import SerializedDagModel
+
+        self.log.error("Deleting Serialized DAG(s) in %s file, please fix the parsing errors.", file_path)

Review comment:
       If a DAG file has a parsing error, this will be logged every single time the file is processed, even though the serialized DAG is only deleted the first time that this function is run.
   
   Maybe we should only log when a serialized run is actually deleted?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916175100


   The other option is to show an outdated view with RED WARNING Boxes. However, what if access_control is changed in the updated DAGs, should the group of users that should no longer have permissions be still shown the actual DAG, it's an edge case but one that has security implications.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916180058


   > what if access_control is changed in the updated DAGs
   
   If there's a parsing error then _no_ changes can apply, so I don't think that really matters.
   
   The main reason I don't like this -- if you tried to fix an error in your task and introduce a syntax error, you can no longer go and see the previous task logs, as you'd get a 404!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on a change in pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
SamWheating commented on a change in pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#discussion_r705880182



##########
File path: airflow/dag_processing/processor.py
##########
@@ -647,3 +648,17 @@ def process_file(
             self.log.exception("Error logging import errors!")
 
         return len(dagbag.dags), len(dagbag.import_errors)
+
+    def delete_serialized_dags_with_parsing_errors(self, session, file_path):
+        from airflow.models.serialized_dag import SerializedDagModel
+
+        self.log.error("Deleting Serialized DAG(s) in %s file, please fix the parsing errors.", file_path)

Review comment:
       If a DAG file has a parsing error, this will be logged every single time the file is processed, even though the serialized DAG is only deleted the first time that this function is run.
   
   Maybe we should only log this a serialized run is actually deleted?

##########
File path: airflow/dag_processing/processor.py
##########
@@ -647,3 +648,17 @@ def process_file(
             self.log.exception("Error logging import errors!")
 
         return len(dagbag.dags), len(dagbag.import_errors)
+
+    def delete_serialized_dags_with_parsing_errors(self, session, file_path):
+        from airflow.models.serialized_dag import SerializedDagModel
+
+        self.log.error("Deleting Serialized DAG(s) in %s file, please fix the parsing errors.", file_path)

Review comment:
       If a DAG file has a parsing error, this will be logged every single time the file is processed, even though the serialized DAG is only deleted the first time that this function is run for a given file.
   
   Maybe we should only log this a serialized run is actually deleted?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916163793


   > Hmmmm, not sure about this -- by removing it it also means you can't view the tree/graph view for it.
   
   You shouldn't be able to --- it might show outdated or wrong info


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916163793


   > Hmmmm, not sure about this -- by removing it it also means you can't view the tree/graph view for it.
   
   You shouldn't be able to --- it might show outdated or wrong info. For example I might have added more tasks or removed a task, while the historical view should show it (thinking of DAG Versioning) but not the current view I feel.
   
   Personally, DAG Parsing error should be equivalent to "file does not exists" to be crystal clear to user that it is not available for triggering or clearing or any such operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #18120: Handle DAGs with Parsing Errors

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #18120:
URL: https://github.com/apache/airflow/pull/18120#issuecomment-916182978


   I think disabling the trigger button, and stopping scheduler creating dagruns is a good idea, but I'm just very uneasy about making any pages suddenly start 404ing as a result of the bad deploy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org