You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/30 09:21:40 UTC

[GitHub] [airflow] luoyuliuyin opened a new issue #15687: celery worker can't read dags from db

luoyuliuyin opened a new issue #15687:
URL: https://github.com/apache/airflow/issues/15687


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**: 2.0.1
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release): centos 7.2
   - **Kernel** (e.g. `uname -a`): Linux ee822bbb1f68 5.10.25-linux
   - **Install tools**: pip
   - **Others**: My start command: 
   `airflow webserver`
   `airflow scheduler --do-pickle`
   `airflow celery worker`
   
   My environment configuration:
   `executor=CeleryExecutor`
   `broker_url=amqp://user:password@xxx:5672`
   `result_backend=db+mysql://user:password@xxx:3306/airflow`
   `sql_alchemy_conn=mysql://user:password@xxx:3306/airflow`
   `donot_pickle=False`
   `store_dag_code = True`
   
   **What happened**:
   celery worker can't read dags from db
   <!-- (please include exact error messages if you can) -->
   
   **What you expected to happen**:
   celery worker read dags from db, not from local file
   <!-- What do you think went wrong? -->
   
   **How to reproduce it**:
   <!---
   
   As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.
   
   If you are using kubernetes, please attempt to recreate the issue using minikube or kind.
   
   ## Install minikube/kind
   
   - Minikube https://minikube.sigs.k8s.io/docs/start/
   - Kind https://kind.sigs.k8s.io/docs/user/quick-start/
   
   If this is a UI bug, please provide a screenshot of the bug or a link to a youtube video of the bug in action
   
   You can include images using the .md style of
   ![alt text](http://url/to/img.png)
   
   To record a screencast, mac users can use QuickTime and then create an unlisted youtube video with the resulting .mov file.
   
   --->
   1. I create a dag name `a_weixiang_test ` in the `dags_folder` directory (the dag file only create in scheduler, not sync to web and worker).
   ![image](https://user-images.githubusercontent.com/28948186/116346707-58aa3600-a81d-11eb-8e4b-06a4c67fe7dc.png)
   
   
   2. Then access the dag through ui, i can visit the dag. This means that the webServer can get dag through db.
   ![image](https://user-images.githubusercontent.com/28948186/116346414-cd30a500-a81c-11eb-9b14-4ec6c78d511e.png)
   
   3. Trigger a new dag run via ui, i get the following error message:
   `Failed to execute task dag_id could not be found: a_weixiang_test. Either the dag did not exist or it failed to parse..`
   ![image](https://user-images.githubusercontent.com/28948186/116346239-6f9c5880-a81c-11eb-9c93-8a1290bde2d8.png)
   
   the worker should read dags from db not from local file
   
   
   **Anything else we need to know**:
   
   <!--
   
   How often does this problem occur? Once? Every time etc?
   
   Any relevant logs to include? Put them here in side a detail tag:
   <details><summary>x.log</summary> lots of stuff </details>
   
   -->
   I noticed that the data in the database does not meet expectations, all pickle-related fields of `dag` table are empty, these fields are: `last_pickled`, `last_expired`, `pickle_id`,but the `dag_pickle` have records, does this matter?
   ![image](https://user-images.githubusercontent.com/28948186/116346551-17b22180-a81d-11eb-8fd4-3d2c65a21f84.png)
   ![image](https://user-images.githubusercontent.com/28948186/116349032-b6408180-a821-11eb-9863-d1d5f00d3062.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-930559599


   This issue is misunderstanding of how Airflow Works (currently) workers ALWAYS read DAGs from the DAGS_FOLDER. There is no way currently to make them read the DAGs from DB. Worker MUST have Dags folder locally mounted and there is currently other way.
   
   The DAGs stored in the DB are only used to be displayed in the Webserver - not to execute the DAGs from the DB. It is impossible to read them from DB for many reasons - for example because you do not have in the DB all the potential files those DAG import. 
   
   Getting read of workers having to access DAGS_FOLDER is not even in plans yet (though ther are AIPs like DAG fetcher which are still in Draft and try to address this new feature (but it's not a prority currently to implement it).
   
   https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AlessioC31 commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
AlessioC31 commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-929299865


   Any news? I have the same problem, [here](https://airflow.apache.org/docs/apache-airflow/2.0.1/executor/celery.html) they say ```The worker needs to have access to its DAGS_FOLDER, and you need to synchronize the filesystems by your own means. A common setup would be to store your DAGS_FOLDER in a Git repository and sync it across machines using Chef, Puppet, Ansible, or whatever you use to configure machines in your environment. If all your boxes have a common mount point, having your pipelines files shared there should work as well```
   
   So I guess this is intended?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-930559599


   This issue is misunderstanding of how Airflow Works (currently) workers ALWAYS read DAGs from the DAGS_FOLDER. There is no way currently to make them read the DAGs from DB. Worker MUST have Dags folder locally mounted and there is currently other way.
   
   The DAGs stored in the DB are only used to be displayed in the Webserver - not to execute the DAGs from the DB. It is impossible to read them from DB for many reasons - for example because you do not have in the DB all the potential files those DAG import. 
   
   Getting read of workers having to access DAGS_FOLDER is not even in plans yet (though ther are AIPs like DAG fetcher which are still in Draft and try to address this new feature (but it's not a prority currently to implement it).
   
   https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AlessioC31 commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
AlessioC31 commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-929299865


   Any news? I have the same problem, [here](https://airflow.apache.org/docs/apache-airflow/2.0.1/executor/celery.html) they say ```The worker needs to have access to its DAGS_FOLDER, and you need to synchronize the filesystems by your own means. A common setup would be to store your DAGS_FOLDER in a Git repository and sync it across machines using Chef, Puppet, Ansible, or whatever you use to configure machines in your environment. If all your boxes have a common mount point, having your pipelines files shared there should work as well```
   
   So I guess this is intended?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] leonsmith edited a comment on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
leonsmith edited a comment on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-929573349


   So after tweaking a couple of things from my work in progress merge request (to link the pickle to the dag) this is still throwing an error when the task is picked up from the worker.
   
   ```python
   ModuleNotFoundError: No module named 'unusual_prefix_d83562dd8d4afa96cb7d254b46b193a7106a58ef_test_logging'
     File "airflow/executors/celery_executor.py", line 121, in _execute_in_fork
       args.func(args)
     File "airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "airflow/cli/commands/task_command.py", line 274, in task_run
       dag = get_dag_by_pickle(args.pickle)
     File "airflow/utils/session.py", line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File "airflow/utils/cli.py", line 221, in get_dag_by_pickle
       dag_pickle = session.query(DagPickle).filter(DagPickle.id == pickle_id).first()
     File "sqlalchemy/orm/query.py", line 3429, in first
       ret = list(self[0:1])
     File "sqlalchemy/orm/query.py", line 3203, in __getitem__
       return list(res)
     File "sqlalchemy/orm/loading.py", line 100, in instances
       cursor.close()
     File "sqlalchemy/util/langhelpers.py", line 68, in __exit__
       compat.raise_(
     File "sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "sqlalchemy/orm/loading.py", line 80, in instances
       rows = [proc(row) for row in fetch]
     File "sqlalchemy/orm/loading.py", line 80, in <listcomp>
       rows = [proc(row) for row in fetch]
     File "sqlalchemy/orm/loading.py", line 579, in _instance
       _populate_full(
     File "sqlalchemy/orm/loading.py", line 725, in _populate_full
       dict_[key] = getter(row)
     File "sqlalchemy/sql/sqltypes.py", line 1723, in process
       return loads(value)
     File "dill/_dill.py", line 327, in loads
       return load(file, ignore, **kwds)
     File "dill/_dill.py", line 313, in load
       return Unpickler(file, ignore=ignore, **kwds).load()
     File "dill/_dill.py", line 525, in load
       obj = StockUnpickler.load(self)
     File "dill/_dill.py", line 515, in find_class
       return StockUnpickler.find_class(self, module, name)```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-931205897


   It's not well used, and because it uses stock Pickle it probably doesn't work all that well (even bugs like this aside) :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-931121744


   @potiuk That is only true by default, but there is a setting called `donot_pickle` which as a default value of True, but if that is set to false then the dag should be pickled in to the database, and `--pickle-id` passed to the worker meaning that it _doesn't_ read it from disk:
   
   https://github.com/apache/airflow/blob/e7925d8255e836abd8912783322d61b3a9ff657a/airflow/cli/commands/task_command.py#L272-L276


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] microhuang edited a comment on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
microhuang edited a comment on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-901906001


   me too, +1.
   
   airflow==2.1.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] leonsmith commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
leonsmith commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-929573349


   So after tweaking a couple of things from my work in progress merge request this is still throwing an error when the task is picked up from the worker.
   
   ```python
   ModuleNotFoundError: No module named 'unusual_prefix_d83562dd8d4afa96cb7d254b46b193a7106a58ef_test_logging'
     File "airflow/executors/celery_executor.py", line 121, in _execute_in_fork
       args.func(args)
     File "airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "airflow/cli/commands/task_command.py", line 274, in task_run
       dag = get_dag_by_pickle(args.pickle)
     File "airflow/utils/session.py", line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File "airflow/utils/cli.py", line 221, in get_dag_by_pickle
       dag_pickle = session.query(DagPickle).filter(DagPickle.id == pickle_id).first()
     File "sqlalchemy/orm/query.py", line 3429, in first
       ret = list(self[0:1])
     File "sqlalchemy/orm/query.py", line 3203, in __getitem__
       return list(res)
     File "sqlalchemy/orm/loading.py", line 100, in instances
       cursor.close()
     File "sqlalchemy/util/langhelpers.py", line 68, in __exit__
       compat.raise_(
     File "sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "sqlalchemy/orm/loading.py", line 80, in instances
       rows = [proc(row) for row in fetch]
     File "sqlalchemy/orm/loading.py", line 80, in <listcomp>
       rows = [proc(row) for row in fetch]
     File "sqlalchemy/orm/loading.py", line 579, in _instance
       _populate_full(
     File "sqlalchemy/orm/loading.py", line 725, in _populate_full
       dict_[key] = getter(row)
     File "sqlalchemy/sql/sqltypes.py", line 1723, in process
       return loads(value)
     File "dill/_dill.py", line 327, in loads
       return load(file, ignore, **kwds)
     File "dill/_dill.py", line 313, in load
       return Unpickler(file, ignore=ignore, **kwds).load()
     File "dill/_dill.py", line 525, in load
       obj = StockUnpickler.load(self)
     File "dill/_dill.py", line 515, in find_class
       return StockUnpickler.find_class(self, module, name)```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] microhuang edited a comment on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
microhuang edited a comment on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-901906001


   me too, +1.
   
   airflow==2.1.2 and airflow==2.1.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gutiou commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
gutiou commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-844778229


   Any update on it?  I have the same issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-932963375


   Agree with @kaxil 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #15687:
URL: https://github.com/apache/airflow/issues/15687


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] microhuang commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
microhuang commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-901906001


   me too, +1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #15687:
URL: https://github.com/apache/airflow/issues/15687


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-931182061


   Ah TIL. I was not aware at all that we can have worker without DAG files mounted locally 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-931532156


   FWIW -- We should remove this too in favour of DAG Versioning + Remote DAG Fetcher work sometime early next year.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] derkuci commented on issue #15687: celery worker can't read dags from db

Posted by GitBox <gi...@apache.org>.
derkuci commented on issue #15687:
URL: https://github.com/apache/airflow/issues/15687#issuecomment-1000744712


   Same here, dynamic DAGs (relatively slow to generate), and was hoping to use the "dag pickle" feature but found #18584 and this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org