You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/12 18:29:16 UTC

[GitHub] [airflow] rootcss opened a new pull request #10303: Add docs for how airflow manages packages and imports

rootcss opened a new pull request #10303:
URL: https://github.com/apache/airflow/pull/10303


   WIP. Adds docs about how Airflow manages packages and imports.
   
   related: https://github.com/apache/airflow/issues/8715
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#issuecomment-679307976


   @kaxil @potiuk @turbaszek PTAL. This is a fantastic article and I will be grateful for every comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#issuecomment-687573219


   Spell check clecks failed. Can you fix it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469505026



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+
+How does Python load packages?
+------------------------------
+Python's `sys <https://docs.python.org/3/library/sys.html>`_ package provides key variables and functions that are used
+or maintained by the interpreter. One such variable is ``sys.path``. It's a list of directories which is searched by
+Python during imports. for example,

Review comment:
       ```suggestion
   The list of directories from which Python tries to load the module is given by the variable: any:`sys.path`. Python really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>` __ of this variable, including depending on the operating system and how Python is installed.
   
   You can check the contents of this variable for the current Python environment by running an interactive terminal as in the example below:
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rootcss commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
rootcss commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r476522502



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in
+the example below:
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH`` if provided, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session by simply using append
+(for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching for packages in the newer
+paths once they're added. Airflow makes use of this feature as described in the further sections.
+
+In the variable ``sys.path`` there is a directory ``site-packages`` which contains the installed **external packages**,
+which means you can install packages with ``pip`` or ``anaconda`` and you can use them in Airflow. In the next section,
+you will learn how to create your own simple installable package and how to specify additional directories to be added
+to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+
+
+Creating a package in Python
+----------------------------
+
+1. Before starting, install the following packages:
+
+``setuptools``: setuptools is a package development process library designed for creating and distributing Python packages.
+
+``wheel``: The wheel package provides a bdist_wheel command for setuptools. It creates .whl file which is directly
+installable through the ``pip install`` command. We can then upload the same file to pypi.org.
+
+.. code-block:: bash
+    pip install --upgrade pip setuptools wheel
+
+2. Create the package directory - in our case, we will call it ``airflow_operators``.
+
+.. code-block:: bash
+    mkdir airflow_operators
+
+3. Create the file ``__init__.py`` inside the package and add following code:
+
+.. code-block:: python
+
+    print("Hello from airflow_operators")
+
+When we import this package, it should print the above message.
+
+4. Create ``setup.py``:
+
+.. code-block:: python
+
+    import setuptools
+
+    setuptools.setup(
+        name='airflow_operators',
+    )
+
+5. Build the wheel:
+
+.. code-block:: bash
+
+    python setup.py bdist_wheel
+
+This will create a few directories in the project and the overall structure will look like following:
+
+.. code-block:: bash
+
+    .
+    ├── airflow_operators
+    │   ├── __init__.py
+    ├── airflow_operators.egg-info
+    │   ├── PKG-INFO
+    │   ├── SOURCES.txt
+    │   ├── dependency_links.txt
+    │   └── top_level.txt
+    ├── build
+    │   └── bdist.macosx-10.15-x86_64
+    ├── dist
+    │   └── airflow_operators-0.0.0-py3-none-any.whl
+    └── setup.py
+
+
+6. Install the .whl file using pip:
+
+.. code-block:: bash
+
+    pip install dist/airflow_operators-0.0.0-py3-none-any.whl
+
+7. The package is now ready to use!
+
+.. code-block:: pycon
+
+  >>> import airflow_operators
+  Hello from airflow_operators
+  >>>
+
+The package can be removed using pip command:
+
+.. code-block:: bash
+
+    pip uninstall airflow_operators
+
+For more details, see: https://packaging.python.org/tutorials/packaging-projects/
+
+
+Adding directories to the path
+------------------------------
+
+You can specify additional directories to be added to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+Start the python shell by providing the path to root of your project using the following command:
+
+.. code-block:: bash
+
+    PYTHONPATH=/home/arch/projects/airflow_operators python
+
+The ``sys.path`` variable will look like below:

Review comment:
       I've added airflow info details example below this as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj merged pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj merged pull request #10303:
URL: https://github.com/apache/airflow/pull/10303


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#issuecomment-686821207


   Awesome work!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rootcss commented on pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
rootcss commented on pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#issuecomment-687594688


   @mik-laj looks good now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469509348



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+
+How does Python load packages?
+------------------------------
+Python's `sys <https://docs.python.org/3/library/sys.html>`_ package provides key variables and functions that are used
+or maintained by the interpreter. One such variable is ``sys.path``. It's a list of directories which is searched by
+Python during imports. for example,
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH``, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session, by simply using append (for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching in the newer paths once they're added. Airflow makes use
+of this feature as described in the next section.
+
+
+How does Airflow modify this behavior?
+--------------------------------------
+Airflow adds three additional directories to ``sys.path``:
+
+- ``conf.get('core', 'dags_folder')``
+- ``conf.get('core', 'airflow_home')``
+- ``conf.get('core', 'plugins_folder')``
+
+
+When and how you can affect the module loading mechanism?
+---------------------------------------------------------
+
+
+How to create python package with operators/plugins?
+----------------------------------------------------
+

Review comment:
       Install the required packages:
   
   Setuptools: Setuptools is a package development process library designed for creating and distributing Python packages.
   Wheel: The Wheel package provides a bdist_wheel command for setuptools. It creates .whl file which is directly installable through the pip install command. We'll then upload the same file to pypi.org.
   
   
   1. pip install --upgrade pip setuptools wheel
   
   2. Create a directory for all files. In our case, we will call it `airflow_operators`
   3. Create setup.py
   
   ```python
   import setuptools
   
   setuptools.setup(
   	name='airflow_operators',  
   )
   ```
   3. Build a wheel
   ```
   python setup.py bdist_wheel
   ```
   
   4. You can install the .whl file using pip:
   ```
   pip install dist/AAAAA
   ```
   
   See: https://packaging.python.org/tutorials/packaging-projects/




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r477603292



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,238 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python

Review comment:
       Nice - https://www.sphinx-doc.org/en/1.7/markup/inline.html#role-any thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469507598



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+
+How does Python load packages?
+------------------------------
+Python's `sys <https://docs.python.org/3/library/sys.html>`_ package provides key variables and functions that are used
+or maintained by the interpreter. One such variable is ``sys.path``. It's a list of directories which is searched by
+Python during imports. for example,
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH``, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session, by simply using append (for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching in the newer paths once they're added. Airflow makes use
+of this feature as described in the next section.
+
+
+How does Airflow modify this behavior?
+--------------------------------------
+Airflow adds three additional directories to ``sys.path``:
+
+- ``conf.get('core', 'dags_folder')``
+- ``conf.get('core', 'airflow_home')``
+- ``conf.get('core', 'plugins_folder')``
+

Review comment:
       ```suggestion
   You can use them similar to directories specified with the environment variable :envvar:`PYTHONPATH`.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469498469



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow

Review comment:
       ```suggestion
   Loading modules
   ```
   or 
   ```suggestion
   Module management
   ```
   The title should consist of 2-3 words to fit correctly in the menu.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#issuecomment-679307467


   This article is fantastic. I think it is worth increasing its visibility. Can you add links to it at different places in the documentation?
   https://airflow.readthedocs.io/en/latest/tutorial.html#importing-modules
   https://airflow.readthedocs.io/en/latest/concepts.html#custom-xcom-backend
   https://airflow.readthedocs.io/en/latest/concepts.html#where-to-put-airflow-local-settings-py
   https://airflow.readthedocs.io/en/latest/plugins.html
   https://airflow.readthedocs.io/en/latest/security/api.html#api-authentication
   https://airflow.readthedocs.io/en/latest/logging-monitoring/metrics.html#setup
   https://airflow.readthedocs.io/en/latest/howto/set-config.html#setting-configuration-options
   https://airflow.readthedocs.io/en/latest/howto/custom-operator.html#creating-a-custom-operator first note
   https://airflow.readthedocs.io/en/latest/howto/customize-state-colors-ui.html?highlight=PYTHONPATH#customizing-state-colours-in-ui
   https://airflow.readthedocs.io/en/latest/logging-monitoring/logging-tasks.html?highlight=PYTHONPATH#advanced-configuration
   https://airflow.readthedocs.io/en/latest/executor/celery.html?highlight=PYTHONPATH#celery-executor
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469508434



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+
+How does Python load packages?
+------------------------------
+Python's `sys <https://docs.python.org/3/library/sys.html>`_ package provides key variables and functions that are used
+or maintained by the interpreter. One such variable is ``sys.path``. It's a list of directories which is searched by
+Python during imports. for example,
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH``, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session, by simply using append (for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching in the newer paths once they're added. Airflow makes use
+of this feature as described in the next section.
+
+
+How does Airflow modify this behavior?
+--------------------------------------
+Airflow adds three additional directories to ``sys.path``:
+
+- ``conf.get('core', 'dags_folder')``
+- ``conf.get('core', 'airflow_home')``
+- ``conf.get('core', 'plugins_folder')``
+
+
+When and how you can affect the module loading mechanism?
+---------------------------------------------------------
+
+
+How to create python package with operators/plugins?
+----------------------------------------------------
+
+
+How to use PYTHONPATH?
+----------------------
+
+
+How do you check the contents of the sys.path variable?
+-------------------------------------------------------
+

Review comment:
       To check the current contents of the variable: any: `sys.path` in your environment, you can run the use ``airflow info``. An example of the contents of the sys.path variable specified by this command may be as follows:
   ```
   Python PATH: [/usr/local/bin:/opt/airflow:/usr/local/lib/python36.zip:/usr/local/lib/python3.6:/usr/local/lib/python3.6/lib-dynload:/usr/local/lib/python3.6/site-packages:/opt/airflow/airflow/providers/google/cloud/example_dags:/root/airflow/config:/root/airflow/plugins]
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r477582202



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,238 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python

Review comment:
       It's a cross-reference.  We will have link to Python docs. If we link to a resource that is in other documentation, it is sometimes very difficult to determine the correct role.
   https://www.sphinx-doc.org/en/1.7/markup/inline.html#role-ref




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469499957



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+

Review comment:
       ```suggestion
   Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems when modules are not loaded properly.
   
   This article is the last one for you if you need to adapt Airflow to the needs of your organization.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#issuecomment-673084366


   Hi. I had a draft of this article and I included some snippets in the comment. Hope it will be helpful.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r475822497



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in
+the example below:
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH`` if provided, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session by simply using append
+(for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching for packages in the newer
+paths once they're added. Airflow makes use of this feature as described in the further sections.

Review comment:
       > in the further sections
   
   Can you add a link to section?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r475825959



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in
+the example below:
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH`` if provided, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session by simply using append
+(for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching for packages in the newer
+paths once they're added. Airflow makes use of this feature as described in the further sections.
+
+In the variable ``sys.path`` there is a directory ``site-packages`` which contains the installed **external packages**,
+which means you can install packages with ``pip`` or ``anaconda`` and you can use them in Airflow. In the next section,
+you will learn how to create your own simple installable package and how to specify additional directories to be added
+to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+
+
+Creating a package in Python
+----------------------------
+
+1. Before starting, install the following packages:
+
+``setuptools``: setuptools is a package development process library designed for creating and distributing Python packages.
+
+``wheel``: The wheel package provides a bdist_wheel command for setuptools. It creates .whl file which is directly
+installable through the ``pip install`` command. We can then upload the same file to pypi.org.
+
+.. code-block:: bash
+    pip install --upgrade pip setuptools wheel
+
+2. Create the package directory - in our case, we will call it ``airflow_operators``.
+
+.. code-block:: bash
+    mkdir airflow_operators
+
+3. Create the file ``__init__.py`` inside the package and add following code:
+
+.. code-block:: python
+
+    print("Hello from airflow_operators")
+
+When we import this package, it should print the above message.
+
+4. Create ``setup.py``:
+
+.. code-block:: python
+
+    import setuptools
+
+    setuptools.setup(
+        name='airflow_operators',
+    )
+
+5. Build the wheel:
+
+.. code-block:: bash
+
+    python setup.py bdist_wheel
+
+This will create a few directories in the project and the overall structure will look like following:
+
+.. code-block:: bash
+
+    .
+    ├── airflow_operators
+    │   ├── __init__.py
+    ├── airflow_operators.egg-info
+    │   ├── PKG-INFO
+    │   ├── SOURCES.txt
+    │   ├── dependency_links.txt
+    │   └── top_level.txt
+    ├── build
+    │   └── bdist.macosx-10.15-x86_64
+    ├── dist
+    │   └── airflow_operators-0.0.0-py3-none-any.whl
+    └── setup.py
+
+
+6. Install the .whl file using pip:
+
+.. code-block:: bash
+
+    pip install dist/airflow_operators-0.0.0-py3-none-any.whl
+
+7. The package is now ready to use!
+
+.. code-block:: pycon
+
+  >>> import airflow_operators
+  Hello from airflow_operators
+  >>>
+
+The package can be removed using pip command:
+
+.. code-block:: bash
+
+    pip uninstall airflow_operators
+
+For more details, see: https://packaging.python.org/tutorials/packaging-projects/
+
+
+Adding directories to the path
+------------------------------
+
+You can specify additional directories to be added to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+Start the python shell by providing the path to root of your project using the following command:
+
+.. code-block:: bash
+
+    PYTHONPATH=/home/arch/projects/airflow_operators python
+
+The ``sys.path`` variable will look like below:
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/projects/airflow_operators'
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+As we can see that our provided directory is now added to the path, let's try to import the package now:
+
+.. code-block:: pycon
+
+    >>> import airflow_operators
+    Hello from airflow_operators
+    >>>
+
+
+Additional modules in Airflow
+-----------------------------
+Airflow adds three additional directories to the ``sys.path``:
+
+- ``conf.get('core', 'dags_folder')``

Review comment:
       ```suggestion
   - DAG folder: It is configured with option ``[core]]`` in section ``dags_folder ``.
   ```
   Could you use word description here, not code snippet? This may not be understandable for everyone.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r475823484



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in
+the example below:
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH`` if provided, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session by simply using append
+(for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching for packages in the newer
+paths once they're added. Airflow makes use of this feature as described in the further sections.
+
+In the variable ``sys.path`` there is a directory ``site-packages`` which contains the installed **external packages**,
+which means you can install packages with ``pip`` or ``anaconda`` and you can use them in Airflow. In the next section,
+you will learn how to create your own simple installable package and how to specify additional directories to be added
+to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+
+
+Creating a package in Python
+----------------------------
+
+1. Before starting, install the following packages:
+
+``setuptools``: setuptools is a package development process library designed for creating and distributing Python packages.
+
+``wheel``: The wheel package provides a bdist_wheel command for setuptools. It creates .whl file which is directly
+installable through the ``pip install`` command. We can then upload the same file to pypi.org.
+
+.. code-block:: bash
+    pip install --upgrade pip setuptools wheel
+
+2. Create the package directory - in our case, we will call it ``airflow_operators``.
+
+.. code-block:: bash
+    mkdir airflow_operators
+
+3. Create the file ``__init__.py`` inside the package and add following code:
+
+.. code-block:: python
+
+    print("Hello from airflow_operators")
+
+When we import this package, it should print the above message.
+
+4. Create ``setup.py``:
+
+.. code-block:: python
+
+    import setuptools
+
+    setuptools.setup(
+        name='airflow_operators',
+    )
+
+5. Build the wheel:
+
+.. code-block:: bash
+
+    python setup.py bdist_wheel
+
+This will create a few directories in the project and the overall structure will look like following:
+
+.. code-block:: bash
+
+    .
+    ├── airflow_operators
+    │   ├── __init__.py
+    ├── airflow_operators.egg-info
+    │   ├── PKG-INFO
+    │   ├── SOURCES.txt
+    │   ├── dependency_links.txt
+    │   └── top_level.txt
+    ├── build
+    │   └── bdist.macosx-10.15-x86_64
+    ├── dist
+    │   └── airflow_operators-0.0.0-py3-none-any.whl
+    └── setup.py
+
+
+6. Install the .whl file using pip:
+
+.. code-block:: bash
+
+    pip install dist/airflow_operators-0.0.0-py3-none-any.whl
+
+7. The package is now ready to use!
+
+.. code-block:: pycon
+
+  >>> import airflow_operators
+  Hello from airflow_operators
+  >>>
+
+The package can be removed using pip command:
+
+.. code-block:: bash
+
+    pip uninstall airflow_operators
+
+For more details, see: https://packaging.python.org/tutorials/packaging-projects/
+
+
+Adding directories to the path
+------------------------------
+
+You can specify additional directories to be added to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+Start the python shell by providing the path to root of your project using the following command:
+
+.. code-block:: bash
+
+    PYTHONPATH=/home/arch/projects/airflow_operators python
+
+The ``sys.path`` variable will look like below:

Review comment:
       ``airflow info`` please.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r476264200



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in
+the example below:
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH`` if provided, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session by simply using append
+(for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching for packages in the newer
+paths once they're added. Airflow makes use of this feature as described in the further sections.
+
+In the variable ``sys.path`` there is a directory ``site-packages`` which contains the installed **external packages**,
+which means you can install packages with ``pip`` or ``anaconda`` and you can use them in Airflow. In the next section,
+you will learn how to create your own simple installable package and how to specify additional directories to be added
+to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+

Review comment:
       I think it would be worth to add section about .pth files (https://docs.python.org/3/library/site.html#module-site) . I often find it invaluable (especially in production installation) to modularize access to different parts of code. Big organisations often have a lot of independent modules and components and often they are not installed by "pip" packages (for various reason - compilation needs, necessity to use code from sources etc.)  and in those cases adding paths to search in .pth files is a really nice way of modularising such access. Then you need to just drop the .pth file in one of the site modules. The .pth has also the nice property that it can have an executable that it executed at every python interpreter start. It is also used in big packages that needs to be installed from sources (example ROS uses .pth files extensively http://wiki.ros.org/rospy)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469495888



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+
+How does Python load packages?

Review comment:
       If possible, we should avoid questions in headlines. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r475823179



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in
+the example below:
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH`` if provided, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session by simply using append
+(for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching for packages in the newer
+paths once they're added. Airflow makes use of this feature as described in the further sections.
+
+In the variable ``sys.path`` there is a directory ``site-packages`` which contains the installed **external packages**,
+which means you can install packages with ``pip`` or ``anaconda`` and you can use them in Airflow. In the next section,
+you will learn how to create your own simple installable package and how to specify additional directories to be added
+to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+
+
+Creating a package in Python
+----------------------------
+
+1. Before starting, install the following packages:
+
+``setuptools``: setuptools is a package development process library designed for creating and distributing Python packages.
+
+``wheel``: The wheel package provides a bdist_wheel command for setuptools. It creates .whl file which is directly
+installable through the ``pip install`` command. We can then upload the same file to pypi.org.
+
+.. code-block:: bash
+    pip install --upgrade pip setuptools wheel
+
+2. Create the package directory - in our case, we will call it ``airflow_operators``.
+
+.. code-block:: bash
+    mkdir airflow_operators
+
+3. Create the file ``__init__.py`` inside the package and add following code:
+
+.. code-block:: python
+
+    print("Hello from airflow_operators")
+
+When we import this package, it should print the above message.
+
+4. Create ``setup.py``:
+
+.. code-block:: python
+
+    import setuptools
+
+    setuptools.setup(
+        name='airflow_operators',
+    )
+
+5. Build the wheel:
+
+.. code-block:: bash
+
+    python setup.py bdist_wheel
+
+This will create a few directories in the project and the overall structure will look like following:
+
+.. code-block:: bash
+
+    .
+    ├── airflow_operators
+    │   ├── __init__.py
+    ├── airflow_operators.egg-info
+    │   ├── PKG-INFO
+    │   ├── SOURCES.txt
+    │   ├── dependency_links.txt
+    │   └── top_level.txt
+    ├── build
+    │   └── bdist.macosx-10.15-x86_64
+    ├── dist
+    │   └── airflow_operators-0.0.0-py3-none-any.whl
+    └── setup.py
+
+
+6. Install the .whl file using pip:
+
+.. code-block:: bash
+
+    pip install dist/airflow_operators-0.0.0-py3-none-any.whl
+
+7. The package is now ready to use!
+
+.. code-block:: pycon
+
+  >>> import airflow_operators
+  Hello from airflow_operators
+  >>>
+
+The package can be removed using pip command:
+
+.. code-block:: bash
+
+    pip uninstall airflow_operators
+
+For more details, see: https://packaging.python.org/tutorials/packaging-projects/

Review comment:
       Please use meaningful link text.
   https://developers.google.com/style/cross-references
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rootcss commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
rootcss commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r476518701



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.

Review comment:
       I've removed it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rootcss commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
rootcss commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r476472934



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in
+the example below:
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH`` if provided, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session by simply using append
+(for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching for packages in the newer
+paths once they're added. Airflow makes use of this feature as described in the further sections.
+
+In the variable ``sys.path`` there is a directory ``site-packages`` which contains the installed **external packages**,
+which means you can install packages with ``pip`` or ``anaconda`` and you can use them in Airflow. In the next section,
+you will learn how to create your own simple installable package and how to specify additional directories to be added
+to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+
+
+Creating a package in Python
+----------------------------
+
+1. Before starting, install the following packages:
+
+``setuptools``: setuptools is a package development process library designed for creating and distributing Python packages.
+
+``wheel``: The wheel package provides a bdist_wheel command for setuptools. It creates .whl file which is directly
+installable through the ``pip install`` command. We can then upload the same file to pypi.org.
+
+.. code-block:: bash
+    pip install --upgrade pip setuptools wheel
+
+2. Create the package directory - in our case, we will call it ``airflow_operators``.
+
+.. code-block:: bash
+    mkdir airflow_operators
+
+3. Create the file ``__init__.py`` inside the package and add following code:
+
+.. code-block:: python
+
+    print("Hello from airflow_operators")
+
+When we import this package, it should print the above message.
+
+4. Create ``setup.py``:
+
+.. code-block:: python
+
+    import setuptools
+
+    setuptools.setup(
+        name='airflow_operators',
+    )
+
+5. Build the wheel:
+
+.. code-block:: bash
+
+    python setup.py bdist_wheel
+
+This will create a few directories in the project and the overall structure will look like following:
+
+.. code-block:: bash
+
+    .
+    ├── airflow_operators
+    │   ├── __init__.py
+    ├── airflow_operators.egg-info
+    │   ├── PKG-INFO
+    │   ├── SOURCES.txt
+    │   ├── dependency_links.txt
+    │   └── top_level.txt
+    ├── build
+    │   └── bdist.macosx-10.15-x86_64
+    ├── dist
+    │   └── airflow_operators-0.0.0-py3-none-any.whl
+    └── setup.py
+
+
+6. Install the .whl file using pip:
+
+.. code-block:: bash
+
+    pip install dist/airflow_operators-0.0.0-py3-none-any.whl
+
+7. The package is now ready to use!
+
+.. code-block:: pycon
+
+  >>> import airflow_operators
+  Hello from airflow_operators
+  >>>
+
+The package can be removed using pip command:
+
+.. code-block:: bash
+
+    pip uninstall airflow_operators
+
+For more details, see: https://packaging.python.org/tutorials/packaging-projects/

Review comment:
       Thank you.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r475824116



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in

Review comment:
       The content of this variable varies depending on whether the interactive console is launched or the normal program is launched, and Airflow makes changes to it all. Finally, the content of this variable may differ from what the user will see and the Airflow will use. To address this problem, I added a new command `airflow info` that displays sys.path.
   > Python PATH: [/usr/local/bin:/opt/airflow:/usr/local/lib/python37.zip:/usr/local/lib/python3.7:/usr/local/lib/python3.7/lib-dynload:/usr/local/lib/python3.7/site-packages:/files/dags:/root/airflow/config:/root/airflow/plugins]
   
   If you want, you can also change the Python PATH label to sys.path to make the guide easier to read.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rootcss commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
rootcss commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r476504975



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in

Review comment:
       I've added the details about airflow info and output of it in the airflow section of this doc. what do you think about that? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469505904



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+
+How does Python load packages?
+------------------------------
+Python's `sys <https://docs.python.org/3/library/sys.html>`_ package provides key variables and functions that are used
+or maintained by the interpreter. One such variable is ``sys.path``. It's a list of directories which is searched by
+Python during imports. for example,
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH``, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+

Review comment:
       > In the variable: `sys.path` there is a directory` site-packages`` which contains **external packages * installed, which means you can install packages with `pip` or` anaconda` `and you can use them in Airflow. In the __TODO__ section, you will learn how to create your own simple installable package.
   
   > You can specify additional directories to be added to ``sys.path`` with the ** environment variable. :envvar:`PYTHONPATH` **. See section TODO for more information.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r477568354



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,238 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,

Review comment:
       Grammar in this sentence needs fixing I think! Please correct me if I am wrong though :) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469505904



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+
+How does Python load packages?
+------------------------------
+Python's `sys <https://docs.python.org/3/library/sys.html>`_ package provides key variables and functions that are used
+or maintained by the interpreter. One such variable is ``sys.path``. It's a list of directories which is searched by
+Python during imports. for example,
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH``, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+

Review comment:
       ```
   In the variable: `sys.path` there is a directory` site-packages`` which contains **external packages * installed, which means you can install packages with `pip` or` anaconda` `and you can use them in Airflow. In the __TODO__ section, you will learn how to create your own simple installable package.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r476258201



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.

Review comment:
       Is it really the last one :)? I am not sure if that sentence adds any value here :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rootcss commented on pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
rootcss commented on pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#issuecomment-673238169


   Sure. 👍 I'm pushing a draft so far with all the content. Goal is to refractor and polish the document in the end. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r476258508



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.

Review comment:
       ```suggestion
   including depending on the operating system and how Python is installed and which Python version is used.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rootcss commented on pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
rootcss commented on pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#issuecomment-687574043


   yes @mik-laj, looking at them. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469502515



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+
+How does Python load packages?
+------------------------------
+Python's `sys <https://docs.python.org/3/library/sys.html>`_ package provides key variables and functions that are used

Review comment:
       ```
   Airflow uses the standard Python loader to load additional modules. All modules that can find and load it can also be used.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r469502945



##########
File path: docs/packages.rst
##########
@@ -0,0 +1,75 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Loading Packages in Python and Airflow
+======================================
+
+How does Python load packages?
+------------------------------
+Python's `sys <https://docs.python.org/3/library/sys.html>`_ package provides key variables and functions that are used
+or maintained by the interpreter. One such variable is ``sys.path``. It's a list of directories which is searched by
+Python during imports. for example,
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH``, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session, by simply using append (for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching in the newer paths once they're added. Airflow makes use
+of this feature as described in the next section.
+
+
+How does Airflow modify this behavior?
+--------------------------------------
+Airflow adds three additional directories to ``sys.path``:
+
+- ``conf.get('core', 'dags_folder')``
+- ``conf.get('core', 'airflow_home')``

Review comment:
       ```suggestion
   - ``conf.get('core', 'airflow_home')/config``
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r477567198



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,238 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python

Review comment:
       What does the ":any:" do?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org