You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/06 07:57:38 UTC

[GitHub] [airflow] arch-DJ opened a new issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

arch-DJ opened a new issue #13504:
URL: https://github.com/apache/airflow/issues/13504


   **Apache Airflow version**: 2.0
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): Not relevant
   
   **Environment**: 
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release): CentOS Linux 7 (Core)
   - **Kernel** (e.g. `uname -a`): Linux us01odcres-jamuaar-0003 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
   
   - **Install tools**: PostgreSQL 12.2
   - **Others**:
   
   **What happened**:
   
   I have 2 dag files say, dag1.py and dag2.py.
   dag1.py creates a static DAG i.e. once it's parsed it will create 1 specific DAG.
   dag2.py creates dynamic DAGs based on json files kept in an external location.
   
   The static DAG (generated from dag1.py) has a task in the later stage which generates json files and they get picked up by dag2.py which creates dynamic DAGs.
   
   The dynamic DAGs which get created are unpaused by default and get scheduled once.
   This whole process used to work fine with airflow 1.x where DAG serialization was not mandatory and was turned off by default.
   
   But with Airflow 2.0 I am getting the following exception occasionally when the dynamically generated DAGs try to get scheduled by the scheduler.
   
   ```
   [2021-01-06 10:09:38,742] {scheduler_job.py:1293} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/global/packages/python/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
       self._run_scheduler_loop()
     File "/global/packages/python/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/global/packages/python/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1474, in _do_scheduling
       self._create_dag_runs(query.all(), session)
     File "/global/packages/python/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1557, in _create_dag_runs
       dag = self.dagbag.get_dag(dag_model.dag_id, session=session)
     File "/global/packages/python/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
       return func(*args, **kwargs)
     File "/global/packages/python/lib/python3.7/site-packages/airflow/models/dagbag.py", line 171, in get_dag
       self._add_dag_from_db(dag_id=dag_id, session=session)
     File "/global/packages/python/lib/python3.7/site-packages/airflow/models/dagbag.py", line 227, in _add_dag_from_db
       raise SerializedDagNotFound(f"DAG '{dag_id}' not found in serialized_dag table")
   airflow.exceptions.SerializedDagNotFound: DAG 'dynamic_dag_1' not found in serialized_dag table
   ```
   When I checked the serialized_dag table manually, I am able to see the DAG entry there.
   I found the last_updated column value to be **2021-01-06 10:09:38.757076+05:30**
   Whereas the exception got logged at **[2021-01-06 10:09:38,742]** which is little before the last_updated time.
   
   I think this means that the Scheduler tried to look for the DAG entry in the serialized_dag table before DagFileProcessor created the entry.
   
   Is this right or something else can be going on here?
   
   **What you expected to happen**:
   
   Scheduler should start looking for the DAG entry in the serialized_dag table only after DagFileProcessor has added it.
   Here it seems that DagFileProcessor added the DAG entry in the **dag** table, scheduler immediately fetched this dag_id from it and tried to find the same in **serialized_dag** table even before DagFileProcessor could add that.
   
   **How to reproduce it**:
   It occurs occasionally and there is no well defined way to reproduce it.
   
   
   **Anything else we need to know**:
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iameugenejo edited a comment on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
iameugenejo edited a comment on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-761978221


   Experiencing the error.
   ```bash
   Python version: 3.8.0
   Airflow version: 2.0.0
   Node: {REDACTED}
   -------------------------------------------------------------------------------
   Traceback (most recent call last):
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
       response = self.full_dispatch_request()
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
       raise value
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
       rv = self.dispatch_request()
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/auth.py", line 34, in decorated
       return func(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/decorators.py", line 97, in view_func
       return f(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/decorators.py", line 60, in wrapper
       return f(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/views.py", line 1861, in tree
       dag = current_app.dag_bag.get_dag(dag_id)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/utils/session.py", line 65, in wrapper
       return func(*args, session=session, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/models/dagbag.py", line 171, in get_dag
       self._add_dag_from_db(dag_id=dag_id, session=session)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/models/dagbag.py", line 227, in _add_dag_from_db
       raise SerializedDagNotFound(f"DAG '{dag_id}' not found in serialized_dag table")
   airflow.exceptions.SerializedDagNotFound: DAG '60040d7f94fe6dd7d7c8a95b' not found in serialized_dag table
   ```
   
   
   60+ dags are dynamically generated from a single file.
   
   I had to patch two places in the `scheduler_job.py` file where scheduler were keep dying from, then inspecting the dag from the web is throwing the above error.
   
   Here is the patch I applied -
   ```patch
   69d68
   < from airflow.exceptions import SerializedDagNotFound
   1558,1563c1557
   <             try:
   <                 dag = self.dagbag.get_dag(dag_model.dag_id, session=session)
   <             except SerializedDagNotFound as e:
   <                 self.log.exception(e)
   <                 continue
   <
   ---
   >             dag = self.dagbag.get_dag(dag_model.dag_id, session=session)
   1601,1606c1595
   <             try:
   <                 dag = self.dagbag.get_dag(dag_model.dag_id, session=session)
   <             except SerializedDagNotFound as e:
   <                 self.log.exception(e)
   <                 continue
   <
   ---
   >             dag = self.dagbag.get_dag(dag_model.dag_id, session=session)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-793083882


   @shroffrushabh -- Can you post the steps to reproduce your case


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] doowhtron commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
doowhtron commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-759342920


   I have a similar problem. After the "airflow.exceptions.SerializedDagNotFound: DAG 'XXX' not found in serialized_dag table" is logged, the scheduler dies.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rabsr commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
rabsr commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-839629796


   I am still facing the same issue on Airflow 2.0.2. Recently I upgraded Airflow from 2.0.1 to 2.0.2. Getting the same error logs when loading dag.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iameugenejo commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
iameugenejo commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-761978221


   Experiencing the error.
   ```
   Python version: 3.8.0
   Airflow version: 2.0.0
   Node: {REDACTED}
   -------------------------------------------------------------------------------
   Traceback (most recent call last):
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
       response = self.full_dispatch_request()
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
       raise value
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
       rv = self.dispatch_request()
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/auth.py", line 34, in decorated
       return func(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/decorators.py", line 97, in view_func
       return f(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/decorators.py", line 60, in wrapper
       return f(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/views.py", line 1861, in tree
       dag = current_app.dag_bag.get_dag(dag_id)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/utils/session.py", line 65, in wrapper
       return func(*args, session=session, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/models/dagbag.py", line 171, in get_dag
       self._add_dag_from_db(dag_id=dag_id, session=session)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/models/dagbag.py", line 227, in _add_dag_from_db
       raise SerializedDagNotFound(f"DAG '{dag_id}' not found in serialized_dag table")
   airflow.exceptions.SerializedDagNotFound: DAG '60040d7f94fe6dd7d7c8a95b' not found in serialized_dag table```
   
   60+ dags are dynamically generated from a single file.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-765457706


   Will be fixed for 2.0.1 -- currently aiming to release it in 2nd week of Feb


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-839696941


   > I am still facing the same issue on Airflow 2.0.2. Recently I upgraded Airflow from 2.0.1 to 2.0.2. Getting the same error logs when loading dag.
   
   Please raise a different issue with the steps to reproduce, please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-755146205


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-839696941


   > I am still facing the same issue on Airflow 2.0.2. Recently I upgraded Airflow from 2.0.1 to 2.0.2. Getting the same error logs when loading dag.
   
   Please raise a different issue with the steps to reproduce


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] arch-DJ edited a comment on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
arch-DJ edited a comment on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-755900855


   I was going through **sync_to_db** method in **dagbag.py** .
   Most probably this function is called when the parsed DAG has to be saved in the database.
   The sequence followed here is that DAG is first saved into **dag** table and then **serialized_dag** table.
   I think if we reverse the sequence the issue I am getting should get addressed.
   
   
                   try:
                       DAG.bulk_write_to_db(self.dags.values(), session=session)
   
                       # Write Serialized DAGs to DB, capturing errors
                       for dag in self.dags.values():
                           serialize_errors.extend(_serialze_dag_capturing_errors(dag, session))
                   except OperationalError:
                       session.rollback()
                       raise
   
   
   Can anyone please tell me if reversing the sequence is OK?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #13504:
URL: https://github.com/apache/airflow/issues/13504


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iameugenejo edited a comment on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
iameugenejo edited a comment on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-761978221


   Experiencing the error.
   ```
   Python version: 3.8.0
   Airflow version: 2.0.0
   Node: {REDACTED}
   -------------------------------------------------------------------------------
   Traceback (most recent call last):
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
       response = self.full_dispatch_request()
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
       raise value
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
       rv = self.dispatch_request()
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/auth.py", line 34, in decorated
       return func(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/decorators.py", line 97, in view_func
       return f(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/decorators.py", line 60, in wrapper
       return f(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/views.py", line 1861, in tree
       dag = current_app.dag_bag.get_dag(dag_id)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/utils/session.py", line 65, in wrapper
       return func(*args, session=session, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/models/dagbag.py", line 171, in get_dag
       self._add_dag_from_db(dag_id=dag_id, session=session)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/models/dagbag.py", line 227, in _add_dag_from_db
       raise SerializedDagNotFound(f"DAG '{dag_id}' not found in serialized_dag table")
   airflow.exceptions.SerializedDagNotFound: DAG '60040d7f94fe6dd7d7c8a95b' not found in serialized_dag table```
   
   60+ dags are dynamically generated from a single file.
   
   I had to patch two places in the `scheduler_job.py` file where scheduler were keep dying from, then inspecting the dag from the web is throwing the above error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] grillorafael commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
grillorafael commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-765409095


   I'm having the same issue


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] arch-DJ removed a comment on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
arch-DJ removed a comment on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-755900855


   I was going through **sync_to_db** method in **dagbag.py** .
   Most probably this function is called when the parsed DAG has to be saved in the database.
   The sequence followed here is that DAG is first saved into **dag** table and then **serialized_dag** table.
   I think if we reverse the sequence the issue I am getting should get addressed.
   
   
                   try:
                       DAG.bulk_write_to_db(self.dags.values(), session=session)
   
                       # Write Serialized DAGs to DB, capturing errors
                       for dag in self.dags.values():
                           serialize_errors.extend(_serialze_dag_capturing_errors(dag, session))
                   except OperationalError:
                       session.rollback()
                       raise
   
   
   Can anyone please tell me if reversing the sequence is OK?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-766871089


   Can one of you try the solution mentioned in https://github.com/apache/airflow/pull/13893 please? @grillorafael @nik-davis @iameugenejo @adamtay82 
   
   And also provide a reproducible script (the one that generates dynamic DAGs).
   
   I will add another commit to that PR or a new PR so that Scheduler should be able to handle such cases too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] doowhtron commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
doowhtron commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-759853201


   I temporarily fix this by catching SerializedDagNotFound Exception in scheduler_job.py
   
   ```python
   from airflow.exceptions import SerializedDagNotFound
   for dag_run in dag_runs:
       try:
           self._schedule_dag_run(dag_run, active_runs_by_dag_id.get(dag_run.dag_id, set()), session)
       except SerializedDagNotFound as e:
           self.log.exception(e)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shroffrushabh edited a comment on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
shroffrushabh edited a comment on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-792721333


   <img width="1329" alt="Screenshot 2021-03-08 at 5 52 06 PM" src="https://user-images.githubusercontent.com/2037677/110321046-2a875180-8037-11eb-9e36-cdf73ff23c40.png">
   
   Hey @kaxil , I am still seeing the same issue in airflow 2.0.1. Any idea how I can fix this? My scheduler also goes into an unhealthy state after I see this error message.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] nik-davis commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
nik-davis commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-762806828


   Would just like to add our temporary solution that is helping us get around this issue, and seems to be working quite nicely. We've added a python script to run before starting the scheduler which will serialize any missing DAGs, so if it fails on this error it will be fixed the next time it starts up.
   
   Here's serialize_missing_dags.py:
   
   ```
   from airflow.models import DagBag
   from airflow.models.serialized_dag import SerializedDagModel
   
   dag_bag = DagBag()
   
   # Check DB for missing serialized DAGs, and add them if missing
   for dag_id in dag_bag.dag_ids:
       if not SerializedDagModel.get(dag_id):
           dag = dag_bag.get_dag(dag_id)
           SerializedDagModel.write_dag(dag)
   ```
   Which we call before starting the scheduler: `python serialize_missing_dags.py && exec airflow scheduler`
   
   I hope this helps!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shroffrushabh commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
shroffrushabh commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-792721333


   <img width="1329" alt="Screenshot 2021-03-08 at 5 52 06 PM" src="https://user-images.githubusercontent.com/2037677/110321046-2a875180-8037-11eb-9e36-cdf73ff23c40.png">
   
   Hey folks, I am still seeing the same issue in airflow 2.0.1. Any idea how I can fix this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iameugenejo edited a comment on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
iameugenejo edited a comment on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-761978221


   Experiencing the error.
   ```bash
   Python version: 3.8.0
   Airflow version: 2.0.0
   Node: {REDACTED}
   -------------------------------------------------------------------------------
   Traceback (most recent call last):
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
       response = self.full_dispatch_request()
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
       raise value
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
       rv = self.dispatch_request()
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/auth.py", line 34, in decorated
       return func(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/decorators.py", line 97, in view_func
       return f(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/decorators.py", line 60, in wrapper
       return f(*args, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/www/views.py", line 1861, in tree
       dag = current_app.dag_bag.get_dag(dag_id)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/utils/session.py", line 65, in wrapper
       return func(*args, session=session, **kwargs)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/models/dagbag.py", line 171, in get_dag
       self._add_dag_from_db(dag_id=dag_id, session=session)
     File "/root/airflow/2.0.0/venv/lib/python3.8/site-packages/airflow/models/dagbag.py", line 227, in _add_dag_from_db
       raise SerializedDagNotFound(f"DAG '{dag_id}' not found in serialized_dag table")
   airflow.exceptions.SerializedDagNotFound: DAG '60040d7f94fe6dd7d7c8a95b' not found in serialized_dag table
   ```
   
   
   60+ dags are dynamically generated from a single file.
   
   I had to patch two places in the `scheduler_job.py` file where scheduler were keep dying from, then inspecting the dag from the web is throwing the above error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] adamtay82 commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
adamtay82 commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-759630981


   Can confirm similar for us.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #13504:
URL: https://github.com/apache/airflow/issues/13504


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] arch-DJ commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
arch-DJ commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-755900855


   I was going through **sync_to_db** method in **dagbag.py** .
   Most probably this function is called when the parsed DAG has to be saved in the database.
   The sequence followed here is that DAG is first saved into **dag** table and then **serialized_dag** table.
   I think if we reverse the sequence the issue I am getting should get addressed.
   
   '''
   try:
                       DAG.bulk_write_to_db(self.dags.values(), session=session)
   
                       # Write Serialized DAGs to DB, capturing errors
                       for dag in self.dags.values():
                           serialize_errors.extend(_serialze_dag_capturing_errors(dag, session))
                   except OperationalError:
                       session.rollback()
                       raise
   '''
   
   Can anyone please tell me if reversing the sequence is OK?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13504: Scheduler is unable to find serialized DAG in the serialized_dag table

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13504:
URL: https://github.com/apache/airflow/issues/13504#issuecomment-766871089


   Can one of you try the solution mentioned in https://github.com/apache/airflow/pull/13893 please? @grillorafael @nik-davis @iameugenejo @adamtay82 
   
   And also provide a reproducible script (the one that generates dynamic DAGs).
   
   I will add another commit to that PR or a new PR so that Scheduler should be able to handle such cases too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org