You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/15 13:40:14 UTC

[GitHub] [airflow] shivanshs9 opened a new issue #10341: Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG

shivanshs9 opened a new issue #10341:
URL: https://github.com/apache/airflow/issues/10341


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   This questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**: 1.10.11
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: AWS
   - **OS** (e.g. from /etc/os-release): 
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   
   <!-- (please include exact error messages if you can) -->
   I am using an Airflow plugin to generate dynamic DAGs and Airflow is able to successfully load the new ORM DAG in DB. Hence the DAGs listing at home page is also updated. However, trying to refresh the DAG or opening the graph view causes an error:
   ```
   [2020-08-15 13:12:01,862] {{app.py:1891}} ERROR - Exception on /refresh [POST]
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
       response = self.full_dispatch_request()
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
       raise value
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
       rv = self.dispatch_request()
     File "/home/airflow/.local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www_rbac/decorators.py", line 121, in wrapper
       return f(self, *args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/security/decorators.py", line 109, in wraps
       return f(self, *args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www_rbac/decorators.py", line 56, in wrapper
       return f(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/db.py", line 74, in wrapper
       return func(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www_rbac/views.py", line 1941, in refresh
       appbuilder.sm.sync_perm_for_dag(dag_id, dag.access_control)
   AttributeError: 'NoneType' object has no attribute 'access_control'
   ```
   
   **What you expected to happen**:
   
   <!-- What do you think went wrong? -->
   I expected the refresh/trigger and other functionalities to work fine.
   
   
   **How to reproduce it**:
   <!---
   
   As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.
   
   If you are using kubernetes, please attempt to recreate the issue using minikube or kind.
   
   ## Install minikube/kind
   
   - Minikube https://minikube.sigs.k8s.io/docs/start/
   - Kind https://kind.sigs.k8s.io/docs/user/quick-start/
   
   If this is a UI bug, please provide a screenshot of the bug or a link to a youtube video of the bug in action
   
   You can include images using the .md style of
   ![alt text](http://url/to/img.png)
   
   To record a screencast, mac users can use QuickTime and then create an unlisted youtube video with the resulting .mov file.
   
   --->
   - Launch the airflow webserver and scheduler as usual.
   - Create a new DAG in runtime by using [dagen-airflow](https://github.com/shivanshs9/dagen-airflow) plugin.
   - Use the Dagen UI to create a new DAG and approve it.
   - Go to Airflow homepage and you'll find the newly-created DAG listed there.
   - Click on refresh link and the error pops up.
   
   **Anything else we need to know**:
   
   <!--
   
   How often does this problem occur? Once? Every time etc?
   
   Any relevant logs to include? Put them here in side a detail tag:
   
   
   <details><summary>Error logs</summary>
   
   
   
   </details>
   
   --->
   
   I understand that waiting for all airflow workers to restart and tweaking `worker_refresh_interval` config would help here. After all, the issue is due to in-memory instances of DagBag not able to collect the new DAG beforehand.
   While a restart would help, I propose that there could be a configuration bool option like `attempt_refresh_dagbag` (by default it is `False` for backwards compatibility). If it is `True` and if DagBag doesn't have the DAG loaded (in this case, `DagBag.get_dag()` returns `None`), it would attempt to load the DAG directly by processing the file stored in the `DagModel`.
   This would be a better option for those who'd like to not wait for the new DAG to sync with all workers. Plus, they can focus on improving performance by increasing the `worker_refresh_interval` value and still work with the new DAGs ASAP.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #10341: Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #10341:
URL: https://github.com/apache/airflow/issues/10341#issuecomment-674397586


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #10341: Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #10341:
URL: https://github.com/apache/airflow/issues/10341#issuecomment-939428235


   This issue is reported against old version of Airflow (which is end of life).
   If the issue is still present in latest airflow version please let us know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #10341: Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #10341:
URL: https://github.com/apache/airflow/issues/10341#issuecomment-675642866


   > That will refresh DAGs from DB if DAG Serialization is enabled, if not it will refresh them from DAG files
   
   I understand that but I'd again like to confirm that it'll only refresh the DAGs for the in-memory instance of the DagBag for that specific gunicorn worker process that received the request.
   With more than 1 web worker, trying to open DAG details or trigger will still randomly fail since the "refresh all" request may be POSTed to some other worker.
   
   > I understand that waiting for all airflow workers to restart and tweaking `worker_refresh_interval` config would help here. After all, the issue is due to in-memory instances of DagBag not able to collect the new DAG beforehand.
   While a restart would help, I propose that there could be a configuration bool option like `attempt_refresh_dagbag` (by default it is `False` for backwards compatibility). If it is `True` and if DagBag doesn't have the DAG loaded (in this case, `DagBag.get_dag()` returns `None`), it would attempt to load the DAG directly by processing the file stored in the `DagModel`.
   This would be a better option for those who'd like to not wait for the new DAG to sync with all workers. Plus, they can focus on improving performance by increasing the `worker_refresh_interval` value and still work with the new DAGs ASAP.
   
   What I meant above for a catch-all handler (optional and off by default) to get around this randomness of the bug.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #10341: Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #10341:
URL: https://github.com/apache/airflow/issues/10341#issuecomment-675065066


   @kaxil that would refresh DAGs from the DB only for the process that received the POST request, right?
   I think it would still randomly fail when opening the DAG even after "refresh all" button is clicked.
   
   Although it still would be a better solution since it would allow one to attempt to refresh the dagbag, in memory, from DB in all the workers. 🤔


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #10341: Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #10341:
URL: https://github.com/apache/airflow/issues/10341#issuecomment-674915606


   https://github.com/apache/airflow/pull/10328 - should provide you with an endpoint to force refresh all the DAGs


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal closed issue #10341: Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG

Posted by GitBox <gi...@apache.org>.
eladkal closed issue #10341:
URL: https://github.com/apache/airflow/issues/10341


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #10341: Newly-created DAG is loaded in DB but existing DagBag instances are unable to get the DAG

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #10341:
URL: https://github.com/apache/airflow/issues/10341#issuecomment-675071337


   > @kaxil that would refresh DAGs from the DB only for the process that received the POST request, right?
   > I think it would still randomly fail when opening the DAG even after "refresh all" button is clicked.
   > 
   > Although it still would be a better solution since it would allow one to attempt to refresh the dagbag, in memory, from DB in all the workers. 🤔
   
   That will refresh DAGs from DB if DAG Serialization is enabled, if not it will refresh them from DAG files


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org