You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/22 16:02:57 UTC

[GitHub] [airflow] BobasB opened a new issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

BobasB opened a new issue #9722:
URL: https://github.com/apache/airflow/issues/9722


   Hi, I have a very strange and specific behaviour of Airflow on AWS EKS cluster after deploying Calico to enforce network policies.  I have also created AWS support case, but I also need support from Airflow team. I will be very appreciated for any help.
   **What happened**:
   I have Airflow set-up running as 2 k8s pods (Airflow webserver and scheduler). Both Airflow pods use git-sync sidecar container to get DAGs from git and store it at k8s `emptyDir` volume. All works well on fresh EKS cluster without errors. But at the moment of deploing Calico https://docs.aws.amazon.com/eks/latest/userguide/calico.html to EKS cluster all DAGs with local imports become broken. Airflow has default k8s Network policy which allow all ingress/egress traffic without restrictions, and Airflow UI is accessible. But in the Airflow there is a message `DAG "helloWorld" seems to be missing.` and Airflow webserver became to generate an error in the logs: 
   ```
   [2020-07-08 14:43:38,784] {__init__.py:51} INFO - Using executor SequentialExecutor                                                                          │
   │ [2020-07-08 14:43:38,784] {dagbag.py:396} INFO - Filling up the DagBag from /usr/local/airflow/dags/repo                                                     │
   │ [2020-07-08 14:43:38,785] {dagbag.py:225} DEBUG - Importing /usr/local/airflow/dags/repo/airflow_dags/dag_test.py                                            │
   │ [2020-07-08 14:43:39,016] {dagbag.py:239} ERROR - Failed to import: /usr/local/airflow/dags/repo/airflow_dags/dag_test.py                                    │
   │ Traceback (most recent call last):                                                                                                                           │
   │   File "/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 236, in process_file                                                          │
   │     m = imp.load_source(mod_name, filepath)                                                                                                                  │
   │   File "/usr/local/lib/python3.7/imp.py", line 171, in load_source                                                                                           │
   │     module = _load(spec)                                                                                                                                     │
   │   File "<frozen importlib._bootstrap>", line 696, in _load                                                                                                   │
   │   File "<frozen importlib._bootstrap>", line 677, in _load_unlocked                                                                                          │
   │   File "<frozen importlib._bootstrap_external>", line 728, in exec_module                                                                                    │
   │   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed                                                                               │
   │   File "/usr/local/airflow/dags/repo/airflow_dags/dag_test.py", line 5, in <module>                                                                          │
   │     from airflow_dags.common import DEFAULT_ARGS                                                                                                             │
   │ ModuleNotFoundError: No module named 'airflow_dags'
   ```
   
   The DAG itself consists of 2 files: `dag_test.py` and `common.py`. Content of the files are:
   `common.py`
   ```
   from datetime import datetime, timedelta
   
   DEFAULT_ARGS = {
       'owner': 'airflow',
       'depends_on_past': False,
       'start_date': datetime(2020, 3, 26),
       'retry_delay': timedelta(minutes=1),
   }
   ```
   
   `dag_test.py` 
   ```
   from airflow import DAG
   from airflow.operators.bash_operator import BashOperator
   
   from airflow_dags.common import DEFAULT_ARGS
   
   dag = DAG('helloWorld', schedule_interval='*/5 * * * *', default_args=DEFAULT_ARGS)
   
   t1 = BashOperator(
       task_id='task_1',
       bash_command='echo "Hello World from Task 1"; sleep 30',
       dag=dag
   )
   ```
   
   *What I have already tried at the webserver and scheduler pods*:
   - ssh to Airflow pod and enter Python shell. All imports work fine, for example:
   ```
   airflow@airflow-webserver-78bc695cc7-l7z9s:~$ pwd
   /usr/local/airflow
   airflow@airflow-webserver-78bc695cc7-l7z9s:~$ python
   Python 3.7.4 (default, Oct 17 2019, 06:10:02)
   [GCC 8.3.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from airflow_dags.common import DEFAULT_ARGS
   >>> print(DEFAULT_ARGS)
   {'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime.datetime(2020, 3, 26, 0, 0), 'retry_delay': datetime.timedelta(seconds=60)}
   >>>
   ```
   - from pod bash shell, I can execute airflow command and `list_tasks`, and DAG is not broken:
   ```
   airflow@airflow-webserver-78bc695cc7-l7z9s:~$ airflow list_tasks helloWorld
   [2020-07-08 15:37:24,309] {settings.py:212} DEBUG - Setting up DB connection pool (PID 275)
   [2020-07-08 15:37:24,310] {settings.py:253} DEBUG - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=275
   [2020-07-08 15:37:24,366] {cli_action_loggers.py:42} DEBUG - Adding <function default_action_log at 0x7fb9b5a4f710> to pre execution callback
   [2020-07-08 15:37:24,817] {cli_action_loggers.py:68} DEBUG - Calling callbacks: [<function default_action_log at 0x7fb9b5a4f710>]
   [2020-07-08 15:37:24,847] {__init__.py:51} INFO - Using executor SequentialExecutor
   [2020-07-08 15:37:24,848] {dagbag.py:396} INFO - Filling up the DagBag from /usr/local/airflow/dags/repo
   [2020-07-08 15:37:24,849] {dagbag.py:225} DEBUG - Importing /usr/local/airflow/dags/repo/airflow_dags/dag_test.py
   [2020-07-08 15:37:25,081] {dagbag.py:363} DEBUG - Loaded DAG <DAG: helloWorld>
   [2020-07-08 15:37:25,082] {dagbag.py:225} DEBUG - Importing /usr/local/airflow/dags/repo/airflow_dags/dagbg_add.py
   task_1
   [2020-07-08 15:37:25,083] {cli_action_loggers.py:86} DEBUG - Calling callbacks: []
   [2020-07-08 15:37:25,083] {settings.py:278} DEBUG - Disposing DB connection pool (PID 275)
   
   airflow@airflow-webserver-78bc695cc7-l7z9s:~$ airflow trigger_dag helloWorld
   [2020-07-08 15:50:25,446] {settings.py:212} DEBUG - Setting up DB connection pool (PID 717)
   [2020-07-08 15:50:25,446] {settings.py:253} DEBUG - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=717
   [2020-07-08 15:50:25,502] {cli_action_loggers.py:42} DEBUG - Adding <function default_action_log at 0x7fe05c254710> to pre execution callback
   [2020-07-08 15:50:25,986] {cli_action_loggers.py:68} DEBUG - Calling callbacks: [<function default_action_log at 0x7fe05c254710>]
   [2020-07-08 15:50:26,024] {__init__.py:51} INFO - Using executor SequentialExecutor
   [2020-07-08 15:50:26,024] {dagbag.py:396} INFO - Filling up the DagBag from /usr/local/airflow/dags/repo/airflow_dags/dag_test.py
   [2020-07-08 15:50:26,024] {dagbag.py:225} DEBUG - Importing /usr/local/airflow/dags/repo/airflow_dags/dag_test.py
   [2020-07-08 15:50:26,253] {dagbag.py:363} DEBUG - Loaded DAG <DAG: helloWorld>
   Created <DagRun helloWorld @ 2020-07-08 15:50:26+00:00: manual__2020-07-08T15:50:26+00:00, externally triggered: True>
   [2020-07-08 15:50:26,289] {cli_action_loggers.py:86} DEBUG - Calling callbacks: []
   [2020-07-08 15:50:26,289] {settings.py:278} DEBUG - Disposing DB connection pool (PID 717)
   ```
   
   *To summarise*: Airflow DAGs which has local imports become broken in UI and in webserver logs, but is executable from a manual trigger when using EKS cluster with Calico network policies.
   
   Please help me to understand why Airflow DAGs imports become broken in UI.
   
   **Apache Airflow version**: 1.10.10
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   ```
   Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-af3caf", GitCommit:"af3caf6136cd355f467083651cc1010a499f59b1", GitTreeState:"clean", BuildDate:"2020-03-27T21:51:36Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
   ```
   **Environment**:
   - **Cloud provider or hardware configuration**: AWS, EKS
   - **OS** (e.g. from /etc/os-release):
   EKS workers nodes, EC2 instances:
   ```
   NAME="Amazon Linux"
   VERSION="2"
   ID="amzn"
   ID_LIKE="centos rhel fedora"
   VERSION_ID="2"
   PRETTY_NAME="Amazon Linux 2"
   ANSI_COLOR="0;33"
   CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
   HOME_URL="https://amazonlinux.com/"
   ```
   Docker image with installed Airflow:
   ```
   PRETTY_NAME="Debian GNU/Linux 10 (buster)"
   NAME="Debian GNU/Linux"
   VERSION_ID="10"
   VERSION="10 (buster)"
   VERSION_CODENAME=buster
   ID=debian
   HOME_URL="https://www.debian.org/"
   SUPPORT_URL="https://www.debian.org/support"
   BUG_REPORT_URL="https://bugs.debian.org/"
   ```
   - **Kernel** (e.g. `uname -a`):
   ```
   Linux airflow-webserver-78bc695cc7-dmzh2 4.14.181-140.257.amzn2.x86_64 #1 SMP Wed May 27 02:17:36 UTC 2020 x86_64 GNU/Linux
   ```
   - **Install tools**: we use `pipenv` to install Airflow to system `pipenv install --system --deploy --clear`
   - **Others**:
   
   **How to reproduce it**:
   Create EKS cluster and deploy Calico. Use DAG with local imports.
   
   **Anything else we need to know**:
   I have all required env, such as `AIRFLOW_HOME=/usr/local/airflow, AIRFLOW_DAGS_FOLDER=/usr/local/airflow/dags/repo, PYTHONPATH=/usr/local/airflow/dags/repo`  and on EKS cluster without network policies all works fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jasontr commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
jasontr commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-708882057


   @BobasB Hi, I met the same situation and want to share some information. I did some experiments
   - new module in a directory existed before airflow started -> import succeed
   - new module in a directory created after airflow started -> import failure
   - new DAG file in a directory existed before airflow started -> import succeed
   - new DAG file in a directory created after airflow started -> import succeed
   For all these directories and files are under path `{AIRFLOW_HOME}/dags`
   
   In my opinion, DAG files can be scanned as single python scripts by airflow, but modules can not be.
   
   Hopefully, it can help you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BobasB commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
BobasB commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-680962851


   Hi, 
   For those who interested in this issue. I have solved it for myself, and for short, the problem was init-container with git-sync for Webserver.
   In my deployment, I have Scheduler with init + side-car containers for git-sync, and for Webserver only side-car container for git-sync. 
   - Without network policies, git-sync container starts faster than Airflow Webserver and Airflow see this PATH for import and can import. (Also, if I start another Airflow Webserver with another PID/k8s port in already running container it works correctly).
   - When network policies were applied to Airflow, Airflow start faster than in side-car container (git-sync) and doesn't saw PATH for imports.
   
   By adding git-sync init container to Webserver and Scheduler, dags volume always will be initialized before Airflow starts.
   
   Maybe it is Airflow bug, but the problem is: when Airflow starts and PATH doesn't exist it will never be visible for import in future.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jasontr edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
jasontr edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-708882057


   @BobasB Hi, I met the same situation and want to share some information. I did some experiments
   - new module, referred(imported) by a existed DAG,  in a directory existed before airflow started -> import succeed
   - new module, referred(imported) by a existed DAG,  in a directory created after airflow started -> import failure
   - new DAG file in a directory existed before airflow started -> import succeed
   - new DAG file in a directory created after airflow started -> import succeed  
   
   For all these directories and files are under path `{AIRFLOW_HOME}/dags`
   
   In my opinion, DAG files can be scanned as single python scripts by airflow, but modules can not be.
   
   Hopefully, it can help you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tiranox85 edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
tiranox85 edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-833992404


   if some one else has the same issue, i solved by doing a small change on my docker-compose.yml this way the scheduler always starts after the webserver and all the dag's are shown ok 
   
   airflow-scheduler:
       <<: *airflow-common
       command: scheduler
       depends_on:
         airflow-webserver:
           condition: service_healthy
       restart: always


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-824805540


   Ah i see you added it :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-655607231


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-810281810


   Enabling DAG serialization should fix the problem of what the webserver imports (by making it not import DAG code at all anymore) https://airflow.apache.org/docs/apache-airflow/1.10.14/dag-serialization.html
   
   (Or upgrade to 2.0 where that is the only mode of operation.)
   
   I'm closing this as I believe there is a workaround. Please let me know if anyone tries this and has problems and we can re-open this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dnskr commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
dnskr commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-868688744


   @fernhtls The issue is fixed by [PR#16339](https://github.com/apache/airflow/pull/16339) which is already merged and waiting for being released


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fernhtls edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
fernhtls edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-868308697


   I'm seeing a similar behaviour as @Stormhand, but on checking the scheduler / parse logs i see the following:
   
   ```
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 317, in _load_modules_from_file
       loader.exec_module(new_module)
     File "<frozen importlib._bootstrap_external>", line 848, in exec_module
     File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
     File "/opt/airflow/dags/repo/<git_sub_path>/workflows/<path>/dag_file.py", line 14, in <module>
       from custom_operators.<operator_something>  import <ClassOperatorSomething>
   ModuleNotFoundError: No module named 'custom_operators'
   ```
   
   **ps: have redacted a few pieces of the paths and file names**
   
   **So we have some custom operators on a different path then the dags, but still below the dag bag directory.**
   
   Have tried to push a PYTHONPATH env var and it didn't help, but manually in the prompt only with python it works fine now with the PYTHONPATH env var.
   
   We are using **DAG Seralization** with the following parameters:
   
   ```
   store_dag_code = True
   min_serialized_dag_update_interval = 30
   min_serialized_dag_fetch_interval = 10
   max_num_rendered_ti_fields_per_task = 30
   ```
   
   We are passing the usual gitsync argument to the helm chart, plus putting the path to the submodule with `dags.gitSync.subPath`, and dag persistence is turned off.
   
   Airflow version 2.1.0, and we are using the apache-airflow helm chart version 1.0.0.
   
   * Would it really be a matter of the PYTHONPATH not being set correctly for running the parser in our case?
     * The log above keeps showing that the dag import / parsing is not able to import the package, even though I see in `airflow info` on `python_path` the whole dag bag path, but when doing manually now with the PYTHONPATH set, in a python console I can import the package without any problems.
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fernhtls edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
fernhtls edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-868308697


   I'm seeing a similar behaviour as @Stormhand, but on checking the scheduler / parse logs i see the following:
   
   ```
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 317, in _load_modules_from_file
       loader.exec_module(new_module)
     File "<frozen importlib._bootstrap_external>", line 848, in exec_module
     File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
     File "/opt/airflow/dags/repo/<git_sub_path>/workflows/<path>/dag_file.py", line 14, in <module>
       from custom_operators.<operator_something>  import <ClassOperatorSomething>
   ModuleNotFoundError: No module named 'custom_operators'
   ```
   
   **On keep checking the logs above, at logs/scheduler/latest/<path_to_dag>/dag_file.py.log, you still see the import error continuously showing up.** 
   
   **ps: have redacted a few pieces of the paths and file names**
   
   **So we have some custom operators on a different path then the dags, but still below the dag bag directory.**
   
   Have tried to push a PYTHONPATH env var and it didn't help, but manually in the prompt only with python it works fine now with the PYTHONPATH env var.
   
   We are using **DAG Seralization** with the following parameters:
   
   ```
   store_dag_code = True
   min_serialized_dag_update_interval = 30
   min_serialized_dag_fetch_interval = 10
   max_num_rendered_ti_fields_per_task = 30
   ```
   
   We are passing the usual gitsync argument to the helm chart, plus putting the path to the submodule with `dags.gitSync.subPath`, and dag persistence is turned off.
   
   Airflow version 2.1.0, and we are using the apache-airflow helm chart version 1.0.0.
   
   * Would it really be a matter of the PYTHONPATH not being set correctly for running the parser in our case?
     * The log above keeps showing that the dag import / parsing is not able to import the package, even though I see in `airflow info` on `python_path` the whole dag bag path, but when doing manually now with the PYTHONPATH set, in a python console I can import the package without any problems.
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fernhtls commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
fernhtls commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-868308697


   I'm seeing a similar behaviour, but on checking the scheduler / parse logs i see the following:
   
   ```
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 317, in _load_modules_from_file
       loader.exec_module(new_module)
     File "<frozen importlib._bootstrap_external>", line 848, in exec_module
     File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
     File "/opt/airflow/dags/repo/<git_sub_path>/workflows/<path>/dag_file.py", line 14, in <module>
       from custom_operators.<operator_something>  import <ClassOperatorSomething>
   ModuleNotFoundError: No module named 'custom_operators'
   ```
   
   **ps: have redacted a few pieces of the paths and file names**
   
   Have tried to push a PYTHONPATH env var and it didn't help.
   
   We are using **DAG Seralization** with the following parameters:
   
   ```
   store_dag_code = True
   min_serialized_dag_update_interval = 30
   min_serialized_dag_fetch_interval = 10
   max_num_rendered_ti_fields_per_task = 30
   ```
   
   Airflow version 2.1.0, and we are using the apache-helm chart.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fernhtls edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
fernhtls edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-868308697


   I'm seeing a similar behaviour, but on checking the scheduler / parse logs i see the following:
   
   ```
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 317, in _load_modules_from_file
       loader.exec_module(new_module)
     File "<frozen importlib._bootstrap_external>", line 848, in exec_module
     File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
     File "/opt/airflow/dags/repo/<git_sub_path>/workflows/<path>/dag_file.py", line 14, in <module>
       from custom_operators.<operator_something>  import <ClassOperatorSomething>
   ModuleNotFoundError: No module named 'custom_operators'
   ```
   
   **ps: have redacted a few pieces of the paths and file names**
   
   **So we have some custom operators on a different path then the dags, but still below the dag bag directory.**
   
   Have tried to push a PYTHONPATH env var and it didn't help, but manually in the prompt only with python it works fine now with the PYTHONPATH env var.
   
   We are using **DAG Seralization** with the following parameters:
   
   ```
   store_dag_code = True
   min_serialized_dag_update_interval = 30
   min_serialized_dag_fetch_interval = 10
   max_num_rendered_ti_fields_per_task = 30
   ```
   
   We are passing the usual gitsync argument to the helm chart, plus putting the path to the submodule with `dags.gitSync.subPath`, and dag persistence is turned off.
   
   Airflow version 2.1.0, and we are using the apache-airflow helm chart version 1.0.0.
   
   * Would it really be a matter of the PYTHONPATH not being set correctly for running the parser in our case?
     * The log above keeps showing that the dag import / parsing is not able to import the package, even though I see in `airflow info` on `python_path` the whole dag bag path, but when doing manually now with the PYTHONPATH set, in a python console I can import the package without any problems.
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jasontr edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
jasontr edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-708882057


   @BobasB Hi, I met the same situation and want to share some information. I did some experiments
   - new module referred(imported) by a existed DAG in a directory existed before airflow started -> import succeed
   - new module referred(imported) by a existed DAG in a directory created after airflow started -> import failure
   - new DAG file in a directory existed before airflow started -> import succeed
   - new DAG file in a directory created after airflow started -> import succeed  
   
   For all these directories and files are under path `{AIRFLOW_HOME}/dags`
   
   In my opinion, DAG files can be scanned as single python scripts by airflow, but modules can not be.
   
   Hopefully, it can help you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] akohlislb commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
akohlislb commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-680701966


   I think I have the same issue. If I use local python modules in the script, Airflow Webserver UI fails to import them. If I delete the scheduler pod and it gets recreated, then the DAGs are fixed on the UI. Currently, I have to manually fix it by adding a kubectl delete scheduler-pod statement every time I use such DAGs which have local python modules to be imported. But, it should get picked up automatically.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jasontr edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
jasontr edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-708882057






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tiranox85 commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
tiranox85 commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-833992404


   if some one else has the same issue, i solved by doing a small change on my docker-compose.yml this way the scheduler always starts after the webserver and all the dag's are shown ok 
   
   `airflow-scheduler:
       <<: *airflow-common
       command: scheduler
       depends_on:
         airflow-webserver:
           condition: service_healthy
       restart: always`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dnskr commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
dnskr commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-824746001


   Thanks for the reply @potiuk I'm sure that there are no issues with permissions.
   The reason of the problem is how airflow imports DAGs and modules. Looks like it's a race between git-sync and scheduler containers.
   I have added delay to scheduler container [here](https://github.com/apache/airflow/blob/master/chart/templates/scheduler/scheduler-deployment.yaml#L113) and it solved the issue
   ```
   args: ["bash", "-c", "sleep 60 && exec airflow scheduler"]
   ```
   Of course it is very dirty hack and the issue should be fixed somehow else, for example by adding git-sync container to initContainers.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-824730129


   Not sure if it matters, but can you check the ownership/access permission of the files? 
   Just run `ls -la` in your /opt/airflow/dags folder ? 
   Also what is the user you run airflow with ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jasontr edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
jasontr edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-708882057


   @BobasB Hi, I met the same situation and want to share some information. I did some experiments
   - new module referred by a existed DAG in a directory existed before airflow started -> import succeed
   - new module referred by a existed DAG in a directory created after airflow started -> import failure
   - new DAG file in a directory existed before airflow started -> import succeed
   - new DAG file in a directory created after airflow started -> import succeed  
   
   For all these directories and files are under path `{AIRFLOW_HOME}/dags`
   
   In my opinion, DAG files can be scanned as single python scripts by airflow, but modules can not be.
   
   Hopefully, it can help you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] prongs commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
prongs commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-823212675


   This is not working in airflow 2.0 now. 
   
   Here is my setup
   
   DAG bag is located at `/opt/airflow/dags`
   
   #### local module named `b`
   ```bash
   airflow@airflow-execteam-azuredatabricks-688d565566-gwgks:/opt/airflow/dags$ ls b
   __init__.py  __pycache__  b.py
   airflow@airflow-execteam-azuredatabricks-688d565566-gwgks:/opt/airflow/dags$ 
   ```
   
   
   #### dag code /opt/airflow/dags/a.py
   just copied from [tutorial](https://airflow.apache.org/docs/apache-airflow/stable/tutorial.html) with an addition of `from b import *` as the first line
   
   ```python
   from b import *
   
   from datetime import timedelta
   from textwrap import dedent
   
   # The DAG object; we'll need this to instantiate a DAG
   from airflow import DAG
   
   # Operators; we need this to operate!
   from airflow.operators.bash import BashOperator
   from airflow.utils.dates import days_ago
   # These args will get passed on to each operator
   # You can override them on a per-task basis during operator initialization
   default_args = {
       'owner': 'airflow',
       'depends_on_past': False,
       'email': ['airflow@example.com'],
       'email_on_failure': False,
       'email_on_retry': False,
       'retries': 1,
       'retry_delay': timedelta(minutes=5),
       # 'queue': 'bash_queue',
       # 'pool': 'backfill',
       # 'priority_weight': 10,
       # 'end_date': datetime(2016, 1, 1),
       # 'wait_for_downstream': False,
       # 'dag': dag,
       # 'sla': timedelta(hours=2),
       # 'execution_timeout': timedelta(seconds=300),
       # 'on_failure_callback': some_function,
       # 'on_success_callback': some_other_function,
       # 'on_retry_callback': another_function,
       # 'sla_miss_callback': yet_another_function,
       # 'trigger_rule': 'all_success'
   }
   with DAG(
       'tutorial',
       default_args=default_args,
       description='A simple tutorial DAG',
       schedule_interval=timedelta(days=1),
       start_date=days_ago(2),
       tags=['example'],
   ) as dag:
   
       # t1, t2 and t3 are examples of tasks created by instantiating operators
       t1 = BashOperator(
           task_id='print_date',
           bash_command='date',
       )
   
       t2 = BashOperator(
           task_id='sleep',
           depends_on_past=False,
           bash_command='sleep 5',
           retries=3,
       )
       dag.doc_md = __doc__
   
       t1.doc_md = dedent(
           """\
       #### Task Documentation
       You can document your task using the attributes `doc_md` (markdown),
       `doc` (plain text), `doc_rst`, `doc_json`, `doc_yaml` which gets
       rendered in the UI's Task Instance Details page.
   
       ![img](http://montcs.bloomu.edu/~bobmon/Semesters/2012-01/491/import%20soul.png)
       """
       )
       templated_command = dedent(
           """
       {% for i in range(5) %}
           echo "{{ ds }}"
           echo "{{ macros.ds_add(ds, 7)}}"
           echo "{{ params.my_param }}"
       {% endfor %}
       """
       )
   
       t3 = BashOperator(
           task_id='templated',
           depends_on_past=False,
           bash_command=templated_command,
           params={'my_param': 'Parameter I passed in'},
       )
   
       t1 >> [t2, t3]
   ```
   
   
   Now, `airflow dags list` reports
   
   ```
   dag_id                                         | filepath                                          | owner                   | paused
   ===============================================+===================================================+=========================+=======
   tutorial                                       | a.py                                              | airflow                 | None  
   
   
   ```
   
   And `airflow dags report` reports
   
   ```
   file                                               | duration       | dag_num | task_num | dags                                          
   ===================================================+================+=========+==========+===============================================
   /a.py                                              | 0:00:00.295103 | 1       | 3        | tutorial                                      
   ```
   
   However, when I open the web UI, I see the following 
   
   ![image](https://user-images.githubusercontent.com/396205/115391509-2dac5a80-a1fd-11eb-8101-4d21c4b29594.png)
   
   
   Here's the version we're on 
   
   ![image](https://user-images.githubusercontent.com/396205/115391556-3a30b300-a1fd-11eb-9375-b3fddb2168bd.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dnskr commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
dnskr commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-824811273


   Could we reopen the issue so it can be linked in the PR with the fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Stormhand commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
Stormhand commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-856184838


   I have the same issue on 2.1.0. If i add the libs in PYTHONPATH i can import them but the UI shows an error. Im using the community helm charts with git-sync enabled. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fernhtls edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
fernhtls edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-868308697


   I'm seeing a similar behaviour, but on checking the scheduler / parse logs i see the following:
   
   ```
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 317, in _load_modules_from_file
       loader.exec_module(new_module)
     File "<frozen importlib._bootstrap_external>", line 848, in exec_module
     File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
     File "/opt/airflow/dags/repo/<git_sub_path>/workflows/<path>/dag_file.py", line 14, in <module>
       from custom_operators.<operator_something>  import <ClassOperatorSomething>
   ModuleNotFoundError: No module named 'custom_operators'
   ```
   
   **ps: have redacted a few pieces of the paths and file names**
   
   **So we have some custom operators on a different path then the dags, but still below the dag bag directory.**
   
   Have tried to push a PYTHONPATH env var and it didn't help, but manually in the prompt only with python it works fine now with the PYTHONPATH env var.
   
   We are using **DAG Seralization** with the following parameters:
   
   ```
   store_dag_code = True
   min_serialized_dag_update_interval = 30
   min_serialized_dag_fetch_interval = 10
   max_num_rendered_ti_fields_per_task = 30
   ```
   
   We are passing the usual gitsync argument to the helm chart, plus putting the path to the submodule with `dags.gitSync.subPath`, and dag persistence is turned off.
   
   Airflow version 2.1.0, and we are using the apache-airflow helm chart.
   
   * Would it really be a matter of the PYTHONPATH not being set correctly for running the parser in our case?
     * The log above keeps showing that the dag import / parsing is not able to import the package, even though I see in `airflow info` on `python_path` the whole dag bag path, but when doing manually now with the PYTHONPATH set, in a python console I can import the package without any problems.
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AdamLuckey commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
AdamLuckey commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-800677323


   We are also running into this. Though we don't see remediation after restarting the scheduler or worker.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BobasB edited a comment on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
BobasB edited a comment on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-680962851


   Hi, 
   For those who interested in this issue. I have solved it for myself, and for short, the problem was init-container with git-sync for Webserver.
   In my deployment, I have Scheduler with init + side-car containers for git-sync, and for Webserver only side-car container for git-sync. 
   - Without network policies, git-sync container starts faster than Airflow Webserver and Airflow see this PATH for import and can import. 
   - When network policies were applied to Airflow, Airflow starts faster than side-car container (git-sync) and does NOT saw PATH for imports. (Also, if I start another Airflow Webserver with another PID/k8s port in already running container it works correctly).
   
   By adding git-sync init container to Webserver and Scheduler, dags volume always will be initialized before Airflow starts.
   
   Maybe it is Airflow bug, but the problem is: when Airflow starts and PATH doesn't exist it will never be visible for import in future.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-868694562


   Indeed. Closing as duplicate of #16339


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-824805247


   Should not git-sync (single pass) be added as an init-container ? I guess that should solve the problem as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] prongs commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
prongs commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-824742477


   As a workaround we're going with the packaged dags. That seems to work well. Only problem being you can't see the full code in the web ui, which I hope airflow can fix easily in next versions. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tatianguiqu commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
tatianguiqu commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-808006667


   Thank you very much! I met the same problem and fixed it by adding git-sync init-container. Several days of confusion was finally solved.  I have tried modifying the directory structure, using PYTHONPATH and sys.path. All of these did not work. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb closed issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
ashb closed issue #9722:
URL: https://github.com/apache/airflow/issues/9722


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dnskr commented on issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
dnskr commented on issue #9722:
URL: https://github.com/apache/airflow/issues/9722#issuecomment-824716495


   @ashb I have the same case and issue as @prongs but with Airflow 2.0.2 and latest version of Helm chart.
   Playing around directory structure, PYTHONPATH and sys.path didn't help


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #9722: Airflow can't import DAG in UI and logs, but manual DAG trigger works

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #9722:
URL: https://github.com/apache/airflow/issues/9722


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org