You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/11 10:15:22 UTC

[GitHub] [airflow] potiuk commented on a diff in pull request #28300: Add Public Interface description to Airflow documentation

potiuk commented on code in PR #28300:
URL: https://github.com/apache/airflow/pull/28300#discussion_r1066811953


##########
docs/apache-airflow/public-airflow-interface.rst:
##########
@@ -0,0 +1,374 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Public Interface of Airflow
+...........................
+
+The Public Interface of Apache Airflow is a set of interfaces that allow developers to interact
+with and access certain features of the Apache Airflow system. This includes operations such as
+creating and managing DAGs (directed acyclic graphs), managing tasks and their dependencies,
+and extending Airflow capabilities by writing new executors, plugins, operators and providers. The
+Public Interface can be useful for building custom tools and integrations with other systems,
+and for automating certain aspects of the Airflow workflow.
+
+Using Airflow Public Interfaces
+===============================
+
+Using Airflow Public Interfaces is needed when you want to interact with Airflow programmatically:
+
+* When you are writing new (or extending existing) custom Python classes (Operators, Hooks) - basic building
+  blocks of DAGs and can be done DAG Authors to add missing functionality in their DAGs or by those who write
+  reusable custom operators for other DAG authors.
+* When writing new :doc:`Plugins <authoring-and-scheduling/plugins>` that extend Airflow's functionality beyond
+  DAG building blocks. Secrets, Timetables, Triggers, Listeners are all examples of such functionality. This
+  is usually done by users who manage Airflow instances.
+* Bundling custom Operators, Hooks, Plugins and releasing them together via
+  :doc:`provider packages <apache-airflow-providers:index>` - this is usually done by those who intend to
+  provide reusable set of functionality for external service or application Airflow integrates with.
+
+All the ways above involve extending or using Airflow Python classes and functions. The classes
+and functions mentioned below can be relied to keep backwards-compatible signatures and behaviours within
+MAJOR version of Airflow. On the other hand, classes and method started with ``_`` (also known
+as protected Python methods) and ``__`` (also known as private Python methods) are not part of the Public
+Airflow Interface and might change any time.
+
+You can also use Airflow's Public Interface via the `Stable REST API <stable-rest-api-ref>`_ (based on the
+OpenAPI specification). For specific needs you can also use the
+`Airflow Command Line Interface (CLI) <cli-and-env-variables-ref.rst>`_ though it's behaviour might change
+in details (such as output format and available flags) so if you want to rely on those in programmatic
+way, the Stable REST API is recommended.
+
+
+Using Public Interface by DAG Authors
+=====================================
+
+DAGS
+----
+
+The DAG is Airflow's core entity that represents a recurring workflow. You can create a DAG by
+instantiating the :class:`~airflow.models.dag.DAG` class in your DAG file.
+
+Airflow has a set of Example DAGs that you can use to learn how to write DAGs
+
+.. toctree::
+  :includehidden:
+  :glob:
+  :maxdepth: 1
+
+  _api/airflow/example_dags/index
+
+You can read more about DAGs in :doc:`DAGs <core-concepts/dags>`.
+
+.. _pythonapi:operators:
+
+Operators
+---------
+
+Operators allow for generation of certain types of tasks that become nodes in
+the DAG when instantiated.
+
+There are 3 main types of operators:
+
+- Operators that performs an **action**, or tell another system to
+  perform an action
+- **Transfer** operators move data from one system to another
+- **Sensors** are a certain type of operator that will keep running until a
+  certain criterion is met. Examples include a specific file landing in HDFS or
+  S3, a partition appearing in Hive, or a specific time of the day. Sensors
+  are derived from :class:`~airflow.sensors.base.BaseSensorOperator` and run a poke
+  method at a specified :attr:`~airflow.sensors.base.BaseSensorOperator.poke_interval` until it
+  returns ``True``.
+
+All operators are derived from :class:`~airflow.models.baseoperator.BaseOperator` and acquire much
+functionality through inheritance. Since this is the core of the engine,
+it's worth taking the time to understand the parameters of :class:`~airflow.models.baseoperator.BaseOperator`
+to understand the primitive features that can be leveraged in your DAGs.
+
+Airflow has a set of Operators that are considered public. You are also free to extend their functionality
+by extending them:
+
+.. toctree::
+  :includehidden:
+  :glob:
+  :maxdepth: 1
+
+  _api/airflow/operators/index
+
+  _api/airflow/sensors/index
+
+
+You can read more about the operators in :doc:`core-concepts/operators`, :doc:`core-concepts/sensors`.
+Also you can learn how to write a custom operator in :doc:`howto/custom-operator`.
+
+.. _pythonapi:hooks:
+
+Hooks
+-----
+
+Hooks are interfaces to external platforms and databases, implementing a common
+interface when possible and acting as building blocks for operators. All hooks
+are derived from :class:`~airflow.hooks.base.BaseHook`.
+
+Airflow has a set of Hooks that are considered public. You are free to extend their functionality
+by extending them:
+
+.. toctree::
+  :includehidden:
+  :glob:
+  :maxdepth: 1
+
+  _api/airflow/hooks/index
+
+Public Airflow utilities
+------------------------
+
+When writing or extending Hooks and Operators, DAG authors and developers of those Operators and Hooks can
+use the following classes:
+
+* The :class:`~airflow.models.connection.Connection`, which provides access to external service credentials and configuration.
+* The :class:`~airflow.models.variable.Variable`, which provides access to Airflow configuration variables.
+* The :class:`~airflow.models.xcom.XCom` which are used to access to inter-task communication data.
+
+You can read more about the public Airflow utilities in :doc:`howto/connection`,
+:doc:`core-concepts/variables`, :doc:`core-concepts/xcoms`
+
+Public Exceptions
+-----------------
+
+When writing the custom Operators and Hooks, you can handle and raise public Exceptions that Airflow
+exposes:
+
+.. toctree::
+  :includehidden:
+  :glob:
+  :maxdepth: 1
+
+  _api/airflow/exceptions/index
+
+
+Using Public Interface to extend Airflow capabilities
+=====================================================
+
+Airflow uses Plugin mechanism as a way to extend Airflow platform capabilities. They allow to extend
+Airflow UI but also they are the way to expose the below customizations (Triggers, Timetables, Listeners).
+Providers can also implement plugin endpoints and customize Airflow UI and the customizations.
+
+You can read more about plugins in :doc:`authoring-and-scheduling/plugins`. You can read how to extend
+Airflow UI in :doc:`howto/custom-view-plugin`. Note that there are some simple customizations of the UI
+that do not require plugins - you can read more about them in :doc:`howto/customize-ui`.
+
+Here are the ways how Plugins can be used to extend Airflow:
+
+Triggers
+--------
+
+Airflow might be configure to use Triggers in order to implement ``asyncio`` compatible Deferrable Operators.
+All Triggers derive from :class:`~airflow.triggers.base.BaseTrigger`.
+
+Airflow has a set of Triggers that are considered public. You are free to extend their functionality
+by extending them:
+
+.. toctree::
+  :includehidden:
+  :glob:
+  :maxdepth: 1
+
+  _api/airflow/triggers/index
+
+You can read more about Triggers in :doc:`authoring-and-scheduling/deferring`.
+
+Timetables
+----------
+
+Custom timetable implementations provide Airflow's scheduler additional logic to
+schedule DAG runs in ways not possible with built-in schedule expressions.
+All Timetables derive from :class:`~airflow.timetables.base.Timetable`.
+
+Airflow has a set of Timetables that are considered public. You are free to extend their functionality
+by extending them:
+
+.. toctree::
+  :includehidden:
+  :maxdepth: 1
+
+  _api/airflow/timetables/index
+
+You can read more about Timetables in :doc:`howto/timetable`.
+
+Listeners
+---------
+
+Listeners are the way that airflow platform as a whole can be extended to respond to DAG/Task lifecycle events.
+
+This is implemented via :class:`~airflow.listeners.listener.ListenerManager` class that provides hooks that
+can be implemented to respond to DAG/Task lifecycle events.
+
+.. versionadded:: 2.5
+
+   Listener public interface has been added in version 2.5.
+
+You can read more about Listeners in :doc:`administration-and-deployment/listeners`.
+
+Extra Links
+-----------
+
+Extra links are dynamic links that could be added to Airflow independently from custom Operators. Normally
+they can be defined by the Operators, but plugins allow to override the links on aa global level.
+
+You can read more about the Extra Links in :doc:`/howto/define-extra-link`.
+
+Using Public Interface to integrate with external services and applications
+===========================================================================
+
+In order to integrate Airflow with external services and applications you can develop Hooks and Operators,
+similarly to DAG Authors, but also you can also extend the core functionality of Airflow by integrating
+them with those external services. This can be done by just adding the extension classes to PYTHONPATH
+on your system,and configuring Airflow to use them (there is no need to use plugin mechanism). However,
+you can also package and release the Hooks, Operators and core extensions in the form of
+:doc:`provider packages <apache-airflow-providers:index>`. You can read more about core extensions
+delivered by providers via :doc:`provider packages <apache-airflow-providers:core-extensions/index>`.

Review Comment:
   Yes it is that. But I would also leave the links to detailed documents of the providers and "core-extensions" provided by the providers. I find it really important to inter-link our documentation and refer to more details when we are explaining something as it is the way how the reader can look easily for more information.
   
   I will add a fixup after I bulk-merge your corrections.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org