You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/25 14:02:46 UTC

[GitHub] [airflow] rootcss commented on a change in pull request #10303: Add docs for how airflow manages packages and imports

rootcss commented on a change in pull request #10303:
URL: https://github.com/apache/airflow/pull/10303#discussion_r476472934



##########
File path: docs/modules_management.rst
##########
@@ -0,0 +1,194 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Modules Management
+==================
+
+Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article
+will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems
+when modules are not loaded properly.
+
+This article is the last one for you if you need to adapt Airflow to the needs of your organization.
+
+Packages Loading in Python
+--------------------------
+
+The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python
+really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable,
+including depending on the operating system and how Python is installed.
+
+You can check the contents of this variable for the current Python environment by running an interactive terminal as in
+the example below:
+
+.. code-block:: pycon
+
+    >>> import sys
+    >>> from pprint import pprint
+    >>> pprint(sys.path)
+    ['',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7',
+     '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload',
+     '/home/arch/venvs/airflow/lib/python3.7/site-packages']
+
+``sys.path`` is initialized during program startup. The first precedence is given to the current directory,
+i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case
+it was an interactive shell. Second precedence is given to the ``PYTHONPATH`` if provided, followed by installation-dependent
+default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module.
+
+``sys.path`` can also be modified during a Python session by simply using append
+(for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching for packages in the newer
+paths once they're added. Airflow makes use of this feature as described in the further sections.
+
+In the variable ``sys.path`` there is a directory ``site-packages`` which contains the installed **external packages**,
+which means you can install packages with ``pip`` or ``anaconda`` and you can use them in Airflow. In the next section,
+you will learn how to create your own simple installable package and how to specify additional directories to be added
+to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+
+
+Creating a package in Python
+----------------------------
+
+1. Before starting, install the following packages:
+
+``setuptools``: setuptools is a package development process library designed for creating and distributing Python packages.
+
+``wheel``: The wheel package provides a bdist_wheel command for setuptools. It creates .whl file which is directly
+installable through the ``pip install`` command. We can then upload the same file to pypi.org.
+
+.. code-block:: bash
+    pip install --upgrade pip setuptools wheel
+
+2. Create the package directory - in our case, we will call it ``airflow_operators``.
+
+.. code-block:: bash
+    mkdir airflow_operators
+
+3. Create the file ``__init__.py`` inside the package and add following code:
+
+.. code-block:: python
+
+    print("Hello from airflow_operators")
+
+When we import this package, it should print the above message.
+
+4. Create ``setup.py``:
+
+.. code-block:: python
+
+    import setuptools
+
+    setuptools.setup(
+        name='airflow_operators',
+    )
+
+5. Build the wheel:
+
+.. code-block:: bash
+
+    python setup.py bdist_wheel
+
+This will create a few directories in the project and the overall structure will look like following:
+
+.. code-block:: bash
+
+    .
+    ├── airflow_operators
+    │   ├── __init__.py
+    ├── airflow_operators.egg-info
+    │   ├── PKG-INFO
+    │   ├── SOURCES.txt
+    │   ├── dependency_links.txt
+    │   └── top_level.txt
+    ├── build
+    │   └── bdist.macosx-10.15-x86_64
+    ├── dist
+    │   └── airflow_operators-0.0.0-py3-none-any.whl
+    └── setup.py
+
+
+6. Install the .whl file using pip:
+
+.. code-block:: bash
+
+    pip install dist/airflow_operators-0.0.0-py3-none-any.whl
+
+7. The package is now ready to use!
+
+.. code-block:: pycon
+
+  >>> import airflow_operators
+  Hello from airflow_operators
+  >>>
+
+The package can be removed using pip command:
+
+.. code-block:: bash
+
+    pip uninstall airflow_operators
+
+For more details, see: https://packaging.python.org/tutorials/packaging-projects/

Review comment:
       Thank you.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org