You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ash Berlin-Taylor (JIRA)" <ji...@apache.org> on 2018/03/08 12:51:00 UTC

[jira] [Closed] (AIRFLOW-2198) Heuristic in dag_processing list_py_file_paths sometimes ignores files containing DAG definitions

     [ https://issues.apache.org/jira/browse/AIRFLOW-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ash Berlin-Taylor closed AIRFLOW-2198.
--------------------------------------
    Resolution: Duplicate

> Heuristic in dag_processing list_py_file_paths sometimes ignores files containing DAG definitions
> -------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2198
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2198
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.8.2
>            Reporter: Jarosław Bojar
>            Priority: Minor
>
> In function list_py_file_paths in dag_processing module there is a heuristic checking if file contains worda 'airflow' and 'DAG'. If file does not contain both words it is ignored from further processing:
> {code:java}
> # Heuristic that guesses whether a Python file contains an
> # Airflow DAG definition.
> might_contain_dag = True
> if safe_mode and not zipfile.is_zipfile(file_path):
>     with open(file_path, 'rb') as f:
>         content = f.read()
>         might_contain_dag = all(
>             [s in content for s in (b'DAG', b'airflow')])
> if not might_contain_dag:
>     continue
> {code}
> If DAG instantiation is in different file than dag definition (for example dag definition may be in some factory method), file instantiating DAG is ignored by this heuristic, and DAG is not processed.
> For example:
> dag_factory.py:
> {code:java}
> from airflow import DAG
> def create_dag(dag_id, other_params...):
>   ...
>   return DAG(dag_id, ...){code}
> dag_instantiation.py
> {code:java}
> from dag_factory import create_dag
> first_dag = create_dag('first', other_params...)
> second_dag = create_dag('second', other_params...){code}
> In this case file dag_factory.py is processed but it does not contain dag instantiation and file dag_instantiation.py is ignored by heuristic. Consequently dags are not created.
>  
> Function list_py_file_paths has a parameter safe_mode which may be used to turn off this heuristic, but it is never used when this function is called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)