You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/20 17:32:07 UTC

[GitHub] [airflow] potiuk opened a new pull request #17757: Improves documentation about modules management

potiuk opened a new pull request #17757:
URL: https://github.com/apache/airflow/pull/17757


   This PR is result of responding to a number of questions and issues
   Airflow users had on Slack, Stack Overflow and GitHub issues.
   
   While the document was really helpful it did not address some of
   the questions people had and it did not have enough examples, it also
   did not warn people to avoid some of the common mistakes they
   could make (such as using relative imports).
   
   The sequence of the document has also been slightly improved.
   It was not clear what the document was reallyh about as it had
   several independent chapters without logical sequence. Now the
   document starts with intro and explaining the options you have,
   explaining in general how python loads packages and modules and
   then going deper into the usual ways users will add code to
   Airflow, ending with description on how to prepare your custom
   package (which logically follows the frequency of the ways users
   are adding code to Airflow).
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694946565



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,190 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+Also make sure to :ref:`Add init file to your folders <add_init_py_to_your_folders>`.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)

Review comment:
       ```suggestion
   This is an example structure that you might have in your ``dags`` folder:
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it ``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the ``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section ``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable (``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), so they are considered
+   as safe because they are part of configuration of Airflow installation that can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use generic names for the
+folders that will clash with other packages already present in the system. For example if you
+create ``airflow/operators`` subfolder it will not be accessible because Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
 
 .. code-block:: python
 
-    import setuptools
+   from .base_dag import BaseDag  # NEVER DO THAT!!!!
 
-    setuptools.setup(
-        name="airflow_operators",
-    )
+You should import such shared dag using full path (starting from the directory which is added to
+``PYTHONPATH``:
 
-5. Build the wheel:
+.. code-block:: python
 
-.. code-block:: bash
+   from my_company.my_custom_dags.base_dag import BaseDag  # This is cool
 
-    python setup.py bdist_wheel
+The relative imports are counter-intuitive, depending on how you start your python code, they behave
+differently. In Airflow the same DAG file might be parsed in different context (by scheduler, by worker
+or during the tests) and in those cases, relatives imports might behave differently. Always use full
+python package path when you import anything in Airflow DAGs, this will save you a lot of troubles.
+You can read more about relative import caveats in
+`this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_
 
-This will create a few directories in the project and the overall structure will
-look like following:
+Add ``__init__.py`` in package folders
+......................................
 
-.. code-block:: bash
+When you create folders you should add ``__init__.py`` file as empty files in your folders. While in Python 3
+there is a concept of implicit namespaces where you do not have to add those files to folder, Airflow
+expects that the files are added to all packages you added.

Review comment:
       Going to resolve this too (for context see the other thread). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#issuecomment-904152014


   Thanks for the thorough comments @jedcunningham ! I think the only one remaining is about `__init__.py` presence - I'd love to hear what others think about it - is it really needed or is it my imagination :)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694715558



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       Just to add a bit of links on that and my thoughts and context for ``__init__.py`` in relation to our DAGs. I've spent quite some time on looking how the packages/namespaces work and while I also had the impression at some point in time that "as of python 3.3 we should drop the `__init__,py` altogether", my understanding of it is a bit different now.
   
   To focus the discussion - here is the PEP in question: https://www.python.org/dev/peps/pep-0420 
   
   And here is particular chapter about impact on import finders and loaders: https://www.python.org/dev/peps/pep-0420/#impact-on-import-finders-and-loaders 
   
   My understanding of it:
   
   * lack of `__init__.py` in the folder means that (providing that import finder and loader supports it), such folder might be detected as "implicit namespace package" and modules could be added to the package in their own namespace. Separate namespaces mean that the same package can be split across multiple folders. But it really depends on the capabilities of loader/importer. For example in setup.py you need to use `find_namespace_packages` instead of `find_packages` to find the namespace-packages
   
   * some tools/methods that are looking for packages, might not work well with those namespace packages still (they might expect the package = single folder approach for example and break, or might not find the namespace packages at all.. Its less and less common, but it happens (for example in mypy you need to specify `--namespace-packages` flag to find them https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-namespace-packages
   
   * things can get broken easily if there are same package with `classic` packages with `__init__.py` and namespace packages. There are also couple of legacy way to provide namespaces packages (not all compatible with each other) some of those described here https://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python for example but I think there are also some other ways. There are historically ways to modify `__init__.py` in the way to turn the package into a namespaced one. But they have some limitations (for example you had to have exactly the same ``__init__.py`` content in all the incarnations of the same package in different namespaces.  And if you mixed and matched the different ways of defining namespace they might or might not work and behave `strangely`..
   
   * there is a serious performance implication of using "implicit namespace packages". There is even a note in PEP 420 that there is not even an intention to replace "classic" packages with implicit namespaces:
   
   > There is no intention to remove support of regular packages. If a developer knows that her package will never be a portion of a namespace package, then there is a performance advantage to it being a regular package (with an `__init__.py`). Creation and loading of a regular package can take place immediately when it is located along the path. With namespace packages, all entries in the path must be scanned before the package is created.
   
   I am not even sure if our DAG loader supports implicit namespaces, but even if it does I think we should discourage the use of implicit namespaces for DAGs - if not because of the potential clashes and conflicts with "classic" packages or "explicit namespaces" that people can by accident have on their pythonpath - but also because of performance.
   
   I'd love to hear your thoughts about it :)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694715558



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       Just to add a bit of links on that and my thoughts and context for ``__init__.py`` in relation to our DAGs. I've spent quite some time on looking how the packages/namespaces work and while I also had the impression at some point in time that "as of python 3.3 we should drop the `__init__,py` altogether", my understanding of it is a bit different now.
   
   To focus the discussion - here is the PEP in question: https://www.python.org/dev/peps/pep-0420 
   
   And here is particular chapter about impact on import finders and loaders: https://www.python.org/dev/peps/pep-0420/#impact-on-import-finders-and-loaders 
   
   My understanding of it:
   
   * lack of `__init__.py` in the folder means that (providing that import finder and loader supports it), such folder might be detected as "implicit namespace package" and modules could be added to the package in their own namespace. Separate namespaces mean that the same package can be split across multiple folders. But it really depends on the capabilities of loader/importer. For example in setup.py you need to use `find_namespace_packages` instead of `find_packages` to find the namespace-packages
   
   * some tools/methods that are looking for packages, might not work well with those namespace packages still (they might expect the package = single folder approach for example and break, or might not find the namespace packages at all). Its less and less common, but it happens and even in some cases the tools be default do not find the namespace packages, you have to explicitly enable it (for example in mypy you need to specify `--namespace-packages` flag to find them https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-namespace-packages
   
   * things can get broken easily if there are same package with `classic` packages with `__init__.py` and namespace packages. There are also couple of legacy way to provide namespaces packages (not all compatible with each other) some of those described here https://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python for example but I think there are also some other ways. There are historically ways to modify `__init__.py` in the way to turn the package into an `explicitly namespaced` one. But they have some limitations (for example you had to have exactly the same ``__init__.py`` content in all the incarnations of the same package in different namespaces.  And if you mixed and matched the different ways of defining namespace they might or might not work and behave `strangely`..
   
   * there is a serious performance implication of using "implicit namespace packages". There is even a note in PEP 420 that there is not even an intention to replace "classic" packages with implicit namespaces:
   
   > There is no intention to remove support of regular packages. If a developer knows that her package will never be a portion of a namespace package, then there is a performance advantage to it being a regular package (with an `__init__.py`). Creation and loading of a regular package can take place immediately when it is located along the path. With namespace packages, all entries in the path must be scanned before the package is created.
   
   I am not even sure if our DAG loader supports implicit namespaces, but even if it does I think we should discourage the use of implicit namespaces for DAGs - if not because of the potential clashes and conflicts with "classic" packages or "explicit namespaces" that people can by accident have on their pythonpath - but also because of performance.
   
   I'd love to hear your thoughts about it :)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694319921



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.

Review comment:
       I am not sure either :) removed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694324576



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it ``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the ``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section ``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable (``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), so they are considered
+   as safe because they are part of configuration of Airflow installation that can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use generic names for the
+folders that will clash with other packages already present in the system. For example if you
+create ``airflow/operators`` subfolder it will not be accessible because Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
 
 .. code-block:: python
 
-    import setuptools
+   from .base_dag import BaseDag  # NEVER DO THAT!!!!
 
-    setuptools.setup(
-        name="airflow_operators",
-    )
+You should import such shared dag using full path (starting from the directory which is added to
+``PYTHONPATH``:
 
-5. Build the wheel:
+.. code-block:: python
 
-.. code-block:: bash
+   from my_company.my_custom_dags.base_dag import BaseDag  # This is cool
 
-    python setup.py bdist_wheel
+The relative imports are counter-intuitive, depending on how you start your python code, they behave
+differently. In Airflow the same DAG file might be parsed in different context (by scheduler, by worker
+or during the tests) and in those cases, relatives imports might behave differently. Always use full
+python package path when you import anything in Airflow DAGs, this will save you a lot of troubles.
+You can read more about relative import caveats in
+`this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_
 
-This will create a few directories in the project and the overall structure will
-look like following:
+Add ``__init__.py`` in package folders
+......................................
 
-.. code-block:: bash
+When you create folders you should add ``__init__.py`` file as empty files in your folders. While in Python 3
+there is a concept of implicit namespaces where you do not have to add those files to folder, Airflow
+expects that the files are added to all packages you added.

Review comment:
       Hmm. I believe this is not as simple as "Python 3 does not need __init__.py packages" - see my comment above. There is a good reason why we have __init__.py  files in airflow almost everywhere (except providers which is a special case). But maybe I am wrong and we can remove them ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694319259



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       > Python 3 doesn't need __init__.py's, right, 
   
   That's not entirely correct... I also got that impression for some time, but (at least I see it), it's a bit of a misunderstanding.
   bB
   Python 3 **can** turn regular folders into implicit packages with their own namespaces, But a number of tools discovering python code (airflow DAG loading including) does not deal with implicit namespaces and still requirea an `__init__.py` files in the folders). 
   
   Maybe I am wrong about it ? @ashb ? But that's the impression I have. 
   
   I did separate it out, but I would love to have a good statement about it.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694741298



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       Summarising then what this PR is all about in one paragraph (and you can quote me on that and I can add this paragraph to the "modules management"):
   
   > Both relative imports and implicit namespaces while somewhat useful features of Python 3, have potential of being misused, misunderstood and break the installation in implicit ways which are difficult to debug and understand. Rather than trying to make them works with Airflow DAG parsing we strongly discourage people from using them. DAGs should be written using absolute imports and regular "classic" packages with `__init__.py` - for performance reasons, but most of all for the sanity of both the users and committers trying to help them.
   
   😄 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #17757:
URL: https://github.com/apache/airflow/pull/17757


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694715558



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       Just to add a bit of links on that and my thoughts and context for ``__init__.py`` in relation to our DAGs. I've spent quite some time on looking how the packages/namespaces work and while I also had the impression at some point in time that "as of python 3.3 we should drop the `__init__,py` altogether", my understanding of it is a bit different now.
   
   To focus the discussion - here is the PEP in question: https://www.python.org/dev/peps/pep-0420 
   
   And here is particular chapter about impact on import finders and loaders: https://www.python.org/dev/peps/pep-0420/#impact-on-import-finders-and-loaders 
   
   My understanding of it:
   
   * lack of `__init__.py` in the folder means that (providing that import finder and loader supports it), such folder might be detected as "implicit namespace package" and modules could be added to the package in their own namespace. Separate namespaces mean that the same package can be split across multiple folders. But it really depends on the capabilities of loader/importer. For example in setup.py you need to use `find_namespace_packages` instead of `find_packages` to find the namespace-packages
   
   * some tools methods that are looking for packages, might not work well with those namespace packages still (they might expect the package = single folder approach for example and break, or might not find the namespace packages at all.. Its less and less common, but it happens (for example in mypy you need to specify `--namespace-packages` flag to find them https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-namespace-packages
   
   * things can get broken easily if there are same package with `classic` packages with `__init__.py` and namespace packages. There are also couple of legacy way to provide namespaces packages (not all compatible with each other) some of those described here https://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python for example but I think there are also some other ways. There are historically ways to modify `__init__.py` in the way to turn the package into a namespaced one. But they have some limitations (for example you had to have exactly the same ``__init__.py`` content in all the incarnations of the same package in different namespaces.  And if you mixed and matched the different ways of defining namespace they might or might not work and behave `strangely`..
   
   * there is a serious performance implication of using "implicit namespace packages". There is even a note in PEP 420 that there is not even an intention to replace "classic" packages with implicit namespaces:
   
   > There is no intention to remove support of regular packages. If a developer knows that her package will never be a portion of a namespace package, then there is a performance advantage to it being a regular package (with an `__init__.py`). Creation and loading of a regular package can take place immediately when it is located along the path. With namespace packages, all entries in the path must be scanned before the package is created.
   
   I am not even sure if our DAG loader supports implicit namespaces, but even if it does I think we should discourage the use of implicit namespaces for DAGs - if not because of the potential clashes and conflicts with "classic" packages or "explicit namespaces" that people can by accident have on their pythonpath - but also because of performance.
   
   I'd love to hear your thoughts about it :)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694945287



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       Thanks for the detailed response @potiuk, TIL. I knew about (some of) the nuances with `find_namespaces_packages` vs `find_packages`, but the performance advantage is a great reason. I'm happy for us to encourage and show examples of "classic" packages. For the sake of sanity is also a really good argument :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694114919



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -25,9 +25,21 @@ Airflow configuration. The following article will describe how you can
 create your own module so that Airflow can load it correctly, as well as
 diagnose problems when modules are not loaded properly.
 
+Often you want to use your own python code in your Airflow deployment,
+for example common code, libraries, you might want to generate DAGs using
+shared python code and have several DAG python files.
 
-Packages Loading in Python
---------------------------
+You can do it in one of those ways:
+
+* add your modules to one of the folders that airflow automatically adds to ``PYTHONPATH``
+* add extra folders where you keep your code to ``PYTHONPATH``
+* package your code into Python package and install it together with Airflow.

Review comment:
       ```suggestion
   * package your code into a Python package and install it together with Airflow.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       ```suggestion
   it using the full Python path of the files. Take as an example such structure
   ```
   
   Python 3 doesn't need `__init__.py`'s, right, particularly when you're just dealing with a normal dir being added to `PYTHONPATH`?

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it ``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the ``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section ``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable (``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), so they are considered
+   as safe because they are part of configuration of Airflow installation that can be controlled by the
+   people managing the installation.

Review comment:
       ```suggestion
      The DAGS folder in Airflow 2 should not be shared with the webserver. While you can do it, unlike in Airflow 1.10,
      Airflow has no expectations that the DAGS folder is present in the webserver. In fact it's a bit of
      security risk to share the ``dags`` folder with the webserver, because it means that people who write DAGS
      can write code that the webserver will be able to execute (ideally the webserver should
      never run code which can be modified by users who write DAGs). Therefore if you need to share some code
      with the webserver, it is highly recommended that you share it via ``config`` or ``plugins`` folder or
      via installed packages (see below). Those folders are usually managed and accessible by different
      users (Admins/DevOps) than DAG folders (those are usually data-scientists), so they are considered
      safe because they are part of configuration of the Airflow installation and controlled by the
      people managing the installation.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it ``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the ``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section ``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable (``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), so they are considered
+   as safe because they are part of configuration of Airflow installation that can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use generic names for the
+folders that will clash with other packages already present in the system. For example if you
+create ``airflow/operators`` subfolder it will not be accessible because Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from airflow.operators``

Review comment:
       ```suggestion
   named ``airflow.operators`` and it will look there when importing ``from airflow.operators``.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -25,9 +25,21 @@ Airflow configuration. The following article will describe how you can
 create your own module so that Airflow can load it correctly, as well as
 diagnose problems when modules are not loaded properly.
 
+Often you want to use your own python code in your Airflow deployment,
+for example common code, libraries, you might want to generate DAGs using
+shared python code and have several DAG python files.
 
-Packages Loading in Python
---------------------------
+You can do it in one of those ways:
+
+* add your modules to one of the folders that airflow automatically adds to ``PYTHONPATH``

Review comment:
       ```suggestion
   * add your modules to one of the folders that Airflow automatically adds to ``PYTHONPATH``
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:

Review comment:
       ```suggestion
   In the case above, there are the ways you could import the python files:
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it ``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the ``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section ``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable (``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), so they are considered
+   as safe because they are part of configuration of Airflow installation that can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use generic names for the
+folders that will clash with other packages already present in the system. For example if you
+create ``airflow/operators`` subfolder it will not be accessible because Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
 
 .. code-block:: python
 
-    import setuptools
+   from .base_dag import BaseDag  # NEVER DO THAT!!!!
 
-    setuptools.setup(
-        name="airflow_operators",
-    )
+You should import such shared dag using full path (starting from the directory which is added to
+``PYTHONPATH``:

Review comment:
       ```suggestion
   ``PYTHONPATH``):
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -25,9 +25,21 @@ Airflow configuration. The following article will describe how you can
 create your own module so that Airflow can load it correctly, as well as
 diagnose problems when modules are not loaded properly.
 
+Often you want to use your own python code in your Airflow deployment,
+for example common code, libraries, you might want to generate DAGs using
+shared python code and have several DAG python files.
 
-Packages Loading in Python
---------------------------
+You can do it in one of those ways:
+
+* add your modules to one of the folders that airflow automatically adds to ``PYTHONPATH``
+* add extra folders where you keep your code to ``PYTHONPATH``
+* package your code into Python package and install it together with Airflow.
+
+The chapter follows with general description of how Python loads packages and modules and dives
+deeper into the specific of each of the three possibilities above.
+
+How package/modules Loading in Python works

Review comment:
       ```suggestion
   How package/modules loading in Python works
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it ``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be ignored when Airflow

Review comment:
       ```suggestion
   ``dags`` folder to tell Airflow which files from the ``dags`` folder should be ignored when the Airflow
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -25,9 +25,21 @@ Airflow configuration. The following article will describe how you can
 create your own module so that Airflow can load it correctly, as well as
 diagnose problems when modules are not loaded properly.
 
+Often you want to use your own python code in your Airflow deployment,
+for example common code, libraries, you might want to generate DAGs using
+shared python code and have several DAG python files.
 
-Packages Loading in Python
---------------------------
+You can do it in one of those ways:
+
+* add your modules to one of the folders that airflow automatically adds to ``PYTHONPATH``
+* add extra folders where you keep your code to ``PYTHONPATH``
+* package your code into Python package and install it together with Airflow.
+
+The chapter follows with general description of how Python loads packages and modules and dives
+deeper into the specific of each of the three possibilities above.

Review comment:
       ```suggestion
   The next chapter has a general description of how Python loads packages and modules, and dives
   deeper into the specifics of each of the three possibilities above.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it ``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the ``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section ``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable (``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), so they are considered
+   as safe because they are part of configuration of Airflow installation that can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your code.

Review comment:
       ```suggestion
   There are a few gotchas you should be careful about when you import your code.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.

Review comment:
       ```suggestion
   as described below (the root directory which is on the ``PYTHONPATH``).
   ```
   
   I'm not sure what you meant by the portion I've removed. Maybe rephrase if it was important?

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -208,71 +314,101 @@ value as shown below:
     Python PATH: [/home/arch/venv/bin:/home/arch/projects/airflow_operators:/usr/lib/python38.zip:/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/home/arch/venv/lib/python3.8/site-packages:/home/arch/airflow/dags:/home/arch/airflow/config:/home/arch/airflow/plugins]
 
 
-.. _additional-modules-in-airflow:
+Creating a package in Python
+----------------------------
 
-Additional modules in Airflow
------------------------------
+This is most organized way of adding your custom code. Thanks to using the packages,

Review comment:
       ```suggestion
   This is most organized way of adding your custom code. Thanks to using packages,
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it ``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the ``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section ``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable (``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), so they are considered
+   as safe because they are part of configuration of Airflow installation that can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use generic names for the
+folders that will clash with other packages already present in the system. For example if you
+create ``airflow/operators`` subfolder it will not be accessible because Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
 
 .. code-block:: python
 
-    import setuptools
+   from .base_dag import BaseDag  # NEVER DO THAT!!!!
 
-    setuptools.setup(
-        name="airflow_operators",
-    )
+You should import such shared dag using full path (starting from the directory which is added to
+``PYTHONPATH``:
 
-5. Build the wheel:
+.. code-block:: python
 
-.. code-block:: bash
+   from my_company.my_custom_dags.base_dag import BaseDag  # This is cool
 
-    python setup.py bdist_wheel
+The relative imports are counter-intuitive, depending on how you start your python code, they behave
+differently. In Airflow the same DAG file might be parsed in different context (by scheduler, by worker
+or during the tests) and in those cases, relatives imports might behave differently. Always use full
+python package path when you import anything in Airflow DAGs, this will save you a lot of troubles.
+You can read more about relative import caveats in
+`this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_
 
-This will create a few directories in the project and the overall structure will
-look like following:
+Add ``__init__.py`` in package folders
+......................................
 
-.. code-block:: bash
+When you create folders you should add ``__init__.py`` file as empty files in your folders. While in Python 3
+there is a concept of implicit namespaces where you do not have to add those files to folder, Airflow
+expects that the files are added to all packages you added.

Review comment:
       Curious, what breaks with this? My simple toy examples work without `__init__.py`'s?

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -208,71 +314,101 @@ value as shown below:
     Python PATH: [/home/arch/venv/bin:/home/arch/projects/airflow_operators:/usr/lib/python38.zip:/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/home/arch/venv/lib/python3.8/site-packages:/home/arch/airflow/dags:/home/arch/airflow/config:/home/arch/airflow/plugins]
 
 
-.. _additional-modules-in-airflow:
+Creating a package in Python
+----------------------------
 
-Additional modules in Airflow
------------------------------
+This is most organized way of adding your custom code. Thanks to using the packages,
+you might organize your versioning approach, control which versions of the shared code are installed
+and deploy the code to all your instances and containers in controlled way - all by system admins/DevOps
+rather than by the DAG writers. It is usually suitable when you have separate team that manages this

Review comment:
       ```suggestion
   rather than by the DAG writers. It is usually suitable when you have a separate team that manages this
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it ``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the ``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section ``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable (``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), so they are considered
+   as safe because they are part of configuration of Airflow installation that can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use generic names for the
+folders that will clash with other packages already present in the system. For example if you
+create ``airflow/operators`` subfolder it will not be accessible because Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
 
 .. code-block:: python
 
-    import setuptools
+   from .base_dag import BaseDag  # NEVER DO THAT!!!!
 
-    setuptools.setup(
-        name="airflow_operators",
-    )
+You should import such shared dag using full path (starting from the directory which is added to
+``PYTHONPATH``:
 
-5. Build the wheel:
+.. code-block:: python
 
-.. code-block:: bash
+   from my_company.my_custom_dags.base_dag import BaseDag  # This is cool
 
-    python setup.py bdist_wheel
+The relative imports are counter-intuitive, depending on how you start your python code, they behave
+differently. In Airflow the same DAG file might be parsed in different context (by scheduler, by worker
+or during the tests) and in those cases, relatives imports might behave differently. Always use full
+python package path when you import anything in Airflow DAGs, this will save you a lot of troubles.
+You can read more about relative import caveats in
+`this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_

Review comment:
       ```suggestion
   The relative imports are counter-intuitive, and depending on how you start your python code, they can behave
   differently. In Airflow the same DAG file might be parsed in different contexts (by schedulers, by workers
   or during tests) and in those cases, relatives imports might behave differently. Always use full
   python package paths when you import anything in Airflow DAGs, this will save you a lot of troubles.
   You can read more about relative import caveats in
   `this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below

Review comment:
       ```suggestion
      | .airflowignore  -- only needed in ``dags`` folder, see below
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694715558



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       Just to add a bit of links on that and my thoughts and context for ``__init__.py`` in relation to our DAGs. I've spent quite some time on looking how the packages/namespaces work and while I also had the impression at some point in time that "as of python 3.3 we should drop the `__init__,py` altogether, my understanding of it is a bit different now.
   
   To focus the discussion - here is the PEP in question: https://www.python.org/dev/peps/pep-0420 
   
   And here is particular chapter about impact on import finders and loaders: https://www.python.org/dev/peps/pep-0420/#impact-on-import-finders-and-loaders 
   
   My understanding of it:
   
   * lack of __init__.py in the folder means that (providing that import finder and loader supports it), such folder might be detected as "implicit namespace package" and modules could be added to the package in their own namespace. Separate namespaces mean that the same package can be split across multiple folders. But it really depends on the capabilities of loader/importer. For example in setup.py you need to use `find_namespace_packages` instead of `find_packages` to find the namespace-packages
   
   * some tools methods that are looking for packages, might not work well with those namespace packages still (they might expect the package = single folder approach for example and break, or might not find the namespace packages at all.. Its less and less common, but it happens (for example in mypy you need to specify `--namespace-packages` flag to find them https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-namespace-packages
   
   * things can get broken easily if there are same package with `classic` packages with `__init__.py` and namespace packages. There are also couple of legacy way to provide namespaces packages (not all compatible with each other) some of those described here https://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python for example but I think there are also some other ways. There are historically ways to modify `__init__.py` in the way to turn the package into a namespaced one. But they have some limitations (for example you had to have exactly the same ``__init__.py`` content in all the incarnations of the same package in different namespaces.  And if you mixed and matched the different ways of defining namespace they might or might not work and behave `strangely`..
   
   * there is a serious performance implication of using "implicit namespace packages". There is even a note in PEP 420 that there is not even an intention to replace "classic" packages with implicit namespaces:
   
   > There is no intention to remove support of regular packages. If a developer knows that her package will never be a portion of a namespace package, then there is a performance advantage to it being a regular package (with an `__init__.py`). Creation and loading of a regular package can take place immediately when it is located along the path. With namespace packages, all entries in the path must be scanned before the package is created.
   
   I am not even sure if our DAG loader supports implicit namespaces, but even if it does I think we should discourage the use of implicit namespaces for DAGs - if not because of the potential clashes and conflicts with "classic" packages or "explicit namespaces" that people can by accident have on their pythonpath - but also because of performance.
   
   I'd love to hear your thoughts about it :)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694715558



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       Just to add a bit of links on that and my thoughts and context for ``__init__.py`` in relation to our DAGs. I've spent quite some time on looking how the packages/namespaces work and while I also had the impression at some point in time that "as of python 3.3 we should drop the `__init__,py` altogether, my understanding of it is a bit different now.
   
   To focus the discussion - here is the PEP in question: https://www.python.org/dev/peps/pep-0420 
   
   And here is particular chapter about impact on import finders and loaders: https://www.python.org/dev/peps/pep-0420/#impact-on-import-finders-and-loaders 
   
   My understanding of it:
   
   * lack of `__init__.py` in the folder means that (providing that import finder and loader supports it), such folder might be detected as "implicit namespace package" and modules could be added to the package in their own namespace. Separate namespaces mean that the same package can be split across multiple folders. But it really depends on the capabilities of loader/importer. For example in setup.py you need to use `find_namespace_packages` instead of `find_packages` to find the namespace-packages
   
   * some tools methods that are looking for packages, might not work well with those namespace packages still (they might expect the package = single folder approach for example and break, or might not find the namespace packages at all.. Its less and less common, but it happens (for example in mypy you need to specify `--namespace-packages` flag to find them https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-namespace-packages
   
   * things can get broken easily if there are same package with `classic` packages with `__init__.py` and namespace packages. There are also couple of legacy way to provide namespaces packages (not all compatible with each other) some of those described here https://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python for example but I think there are also some other ways. There are historically ways to modify `__init__.py` in the way to turn the package into a namespaced one. But they have some limitations (for example you had to have exactly the same ``__init__.py`` content in all the incarnations of the same package in different namespaces.  And if you mixed and matched the different ways of defining namespace they might or might not work and behave `strangely`..
   
   * there is a serious performance implication of using "implicit namespace packages". There is even a note in PEP 420 that there is not even an intention to replace "classic" packages with implicit namespaces:
   
   > There is no intention to remove support of regular packages. If a developer knows that her package will never be a portion of a namespace package, then there is a performance advantage to it being a regular package (with an `__init__.py`). Creation and loading of a regular package can take place immediately when it is located along the path. With namespace packages, all entries in the path must be scanned before the package is created.
   
   I am not even sure if our DAG loader supports implicit namespaces, but even if it does I think we should discourage the use of implicit namespaces for DAGs - if not because of the potential clashes and conflicts with "classic" packages or "explicit namespaces" that people can by accident have on their pythonpath - but also because of performance.
   
   I'd love to hear your thoughts about it :)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694153747



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       Maybe mention `__init__.py`'s and link down to the "Add __init__.py in package folders" section?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#issuecomment-904731026


   The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17757: Improves documentation about modules management

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694715558



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to ``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as an example such structure

Review comment:
       Just to add a bit of links on that and my thoughts and context for ``__init__.py`` in relation to our DAGs. I've spent quite some time on looking how the packages/namespaces work and (while I also had the impression at some point in time that "as of python 3.3 we should drop the `__init__,py` altogether), my understanding of it is a bit different now.
   
   To focus the discussion - here is the PEP in question: https://www.python.org/dev/peps/pep-0420 
   
   And here is particular chapter about impact on import finders and loaders: https://www.python.org/dev/peps/pep-0420/#impact-on-import-finders-and-loaders 
   
   My understanding of it:
   
   * lack of __init__.py in the folder means that (providing that import finder and loader supports it), such folder might be detected as "implicit namespace package" and modules could be added to the package in their own namespace. Separate namespaces mean that the same package can be split across multiple folders. But it really depends on the capabilities of loader/importer. For example in setup.py you need to use `find_namespace_packages` instead of `find_packages` to find the namespace-packages
   
   * some tools methods that are looking for packages, might not work well with those namespace packages still (they might expect the package = single folder approach for example and break, or might not find the namespace packages at all.. Its less and less common, but it happens (for example in mypy you need to specify `--namespace-packages` flag to find them https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-namespace-packages
   
   * things can get broken easily if there are same package with `classic` packages with `__init__.py` and namespace packages. There are also couple of legacy way to provide namespaces packages (not all compatible with each other) some of those described here https://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python for example but I think there are also some other ways. There are historically ways to modify `__init__.py` in the way to turn the package into a namespaced one. But they have some limitations (for example you had to have exactly the same ``__init__.py`` content in all the incarnations of the same package in different namespaces.  And if you mixed and matched the different ways of defining namespace they might or might not work and behave `strangely`..
   
   * there is a serious performance implication of using "implicit namespace packages". There is even a note in PEP 420 that there is not even an intention to replace "classic" packages with implicit namespaces:
   
   > There is no intention to remove support of regular packages. If a developer knows that her package will never be a portion of a namespace package, then there is a performance advantage to it being a regular package (with an `__init__.py`). Creation and loading of a regular package can take place immediately when it is located along the path. With namespace packages, all entries in the path must be scanned before the package is created.
   
   I am not even sure if our DAG loader supports implicit namespaces, but even if it does I think we should discourage the use of implicit namespaces for DAGs - if not because of the potential clashes and conflicts with "classic" packages or "explicit namespaces" that people can by accident have on their pythonpath - but also because of performance.
   
   I'd love to hear your thoughts about it :)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org