You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by ka...@apache.org on 2021/05/15 00:51:13 UTC

[airflow] branch master updated: Move common pitfall documentation to Airflow docs (#15183)

This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/master by this push:
     new 38ecbd6  Move common pitfall documentation to Airflow docs (#15183)
38ecbd6 is described below

commit 38ecbd6769ecee1136b653c17c2c5c2b17937818
Author: Alan Ma <al...@gmail.com>
AuthorDate: Fri May 14 17:50:42 2021 -0700

    Move common pitfall documentation to Airflow docs (#15183)
    
    Reviewed Common Pitfall page and migrated to Airflow documentation. I also added more common pitfalls.
    
    closes: #10180
---
 airflow/config_templates/config.yml              |  16 +-
 airflow/config_templates/default_airflow.cfg     |  16 +-
 docs/apache-airflow/best-practices.rst           |  18 +-
 docs/apache-airflow/concepts/index.rst           |   1 +
 docs/apache-airflow/concepts/pools.rst           |  20 +-
 docs/apache-airflow/concepts/priority-weight.rst |  61 ++++++
 docs/apache-airflow/concepts/scheduler.rst       |   2 +
 docs/apache-airflow/executor/celery.rst          |   2 +
 docs/apache-airflow/faq.rst                      | 254 +++++++++++++++++++----
 docs/apache-airflow/howto/custom-operator.rst    |   7 +
 docs/apache-airflow/macros-ref.rst               |   2 +
 11 files changed, 334 insertions(+), 65 deletions(-)

diff --git a/airflow/config_templates/config.yml b/airflow/config_templates/config.yml
index 8ed3188..39d2539 100644
--- a/airflow/config_templates/config.yml
+++ b/airflow/config_templates/config.yml
@@ -158,17 +158,19 @@
       default: ~
     - name: parallelism
       description: |
-        The amount of parallelism as a setting to the executor. This defines
-        the max number of task instances that should run simultaneously
-        on this airflow installation
+        This defines the maximum number of task instances that can run concurrently in Airflow
+        regardless of scheduler count and worker count. Generally, this value is reflective of
+        the number of task instances with the running state in the metadata database.
       version_added: ~
       type: string
       example: ~
       default: "32"
     - name: dag_concurrency
       description: |
-        The number of task instances allowed to run concurrently by the scheduler
-        in one DAG. Can be overridden by ``concurrency`` on DAG level.
+        The maximum number of task instances allowed to run concurrently in each DAG. To calculate
+        the number of tasks that is running concurrently for a DAG, add up the number of running
+        tasks for all DAG runs of the DAG. This is configurable at the DAG level with ``concurrency``,
+        which is defaulted as ``dag_concurrency``.
       version_added: ~
       type: string
       example: ~
@@ -182,7 +184,9 @@
       default: "True"
     - name: max_active_runs_per_dag
       description: |
-        The maximum number of active DAG runs per DAG
+        The maximum number of active DAG runs per DAG. The scheduler will not create more DAG runs
+        if it reaches the limit. This is configurable at the DAG level with ``max_active_runs``,
+        which is defaulted as ``max_active_runs_per_dag``.
       version_added: ~
       type: string
       example: ~
diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg
index 28c60c2..bf033ef 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -108,19 +108,23 @@ sql_alchemy_schema =
 # See https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine.params.connect_args
 # sql_alchemy_connect_args =
 
-# The amount of parallelism as a setting to the executor. This defines
-# the max number of task instances that should run simultaneously
-# on this airflow installation
+# This defines the maximum number of task instances that can run concurrently in Airflow
+# regardless of scheduler count and worker count. Generally, this value is reflective of
+# the number of task instances with the running state in the metadata database.
 parallelism = 32
 
-# The number of task instances allowed to run concurrently by the scheduler
-# in one DAG. Can be overridden by ``concurrency`` on DAG level.
+# The maximum number of task instances allowed to run concurrently in each DAG. To calculate
+# the number of tasks that is running concurrently for a DAG, add up the number of running
+# tasks for all DAG runs of the DAG. This is configurable at the DAG level with ``concurrency``,
+# which is defaulted as ``dag_concurrency``.
 dag_concurrency = 16
 
 # Are DAGs paused by default at creation
 dags_are_paused_at_creation = True
 
-# The maximum number of active DAG runs per DAG
+# The maximum number of active DAG runs per DAG. The scheduler will not create more DAG runs
+# if it reaches the limit. This is configurable at the DAG level with ``max_active_runs``,
+# which is defaulted as ``max_active_runs_per_dag``.
 max_active_runs_per_dag = 16
 
 # Whether to load the DAG examples that ship with Airflow. It's good to
diff --git a/docs/apache-airflow/best-practices.rst b/docs/apache-airflow/best-practices.rst
index 4827036..b2ae4ae 100644
--- a/docs/apache-airflow/best-practices.rst
+++ b/docs/apache-airflow/best-practices.rst
@@ -15,6 +15,8 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _best_practice:
+
 Best Practices
 ==============
 
@@ -25,11 +27,19 @@ Creating a new DAG is a two-step process:
 
 This tutorial will introduce you to the best practices for these two steps.
 
+.. _best_practice:writing_a_dag:
+
 Writing a DAG
 ^^^^^^^^^^^^^^
+
 Creating a new DAG in Airflow is quite simple. However, there are many things that you need to take care of
 to ensure the DAG run or failure does not produce unexpected results.
 
+Creating a Custom Operator/Hook
+-------------------------------
+
+Please follow our guide on :ref:`custom Operators <custom_operator>`.
+
 Creating a task
 ---------------
 
@@ -54,7 +64,6 @@ Some of the ways you can avoid producing a different result -
     You should define repetitive parameters such as ``connection_id`` or S3 paths in ``default_args`` rather than declaring them for each task.
     The ``default_args`` help to avoid mistakes such as typographical errors.
 
-
 Deleting a task
 ----------------
 
@@ -101,9 +110,12 @@ or if you need to deserialize a json object from the variable :
     {{ var.json.<variable_name> }}
 
 
-.. note::
+Top level Python Code
+---------------------
 
-    In general, you should not write any code outside the tasks. The code outside the tasks runs every time Airflow parses the DAG, which happens every second by default.
+In general, you should not write any code outside of defining Airflow constructs like Operators. The code outside the
+tasks runs every time Airflow parses an eligible python file, which happens at the minimum frequency of
+:ref:`min_file_process_interval<config:scheduler__min_file_process_interval>` seconds.
 
 
 Testing a DAG
diff --git a/docs/apache-airflow/concepts/index.rst b/docs/apache-airflow/concepts/index.rst
index c635f87..677bd7f 100644
--- a/docs/apache-airflow/concepts/index.rst
+++ b/docs/apache-airflow/concepts/index.rst
@@ -42,6 +42,7 @@ Here you can find detailed documentation about each one of Airflow's core concep
     ../executor/index
     scheduler
     pools
+    priority-weight
     cluster-policies
 
 **Communication**
diff --git a/docs/apache-airflow/concepts/pools.rst b/docs/apache-airflow/concepts/pools.rst
index 482f82b..5b10fdd 100644
--- a/docs/apache-airflow/concepts/pools.rst
+++ b/docs/apache-airflow/concepts/pools.rst
@@ -15,10 +15,14 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _concepts:pool:
+
 Pools
 =====
 
-Some systems can get overwhelmed when too many processes hit them at the same time. Airflow pools can be used to **limit the execution parallelism** on arbitrary sets of tasks. The list of pools is managed in the UI (``Menu -> Admin -> Pools``) by giving the pools a name and assigning it a number of worker slots.
+Some systems can get overwhelmed when too many processes hit them at the same time. Airflow pools can be used to
+**limit the execution parallelism** on arbitrary sets of tasks. The list of pools is managed in the UI
+(``Menu -> Admin -> Pools``) by giving the pools a name and assigning it a number of worker slots.
 
 Tasks can then be associated with one of the existing pools by using the ``pool`` parameter when creating tasks:
 
@@ -33,14 +37,16 @@ Tasks can then be associated with one of the existing pools by using the ``pool`
     )
     aggregate_db_message_job.set_upstream(wait_for_empty_queue)
 
-The ``pool`` parameter can be used in conjunction with the ``priority_weight`` parameter to define priorities in the queue, and which tasks get executed first as slots open up in the pool.
-
-The default ``priority_weight`` is ``1``, and can be bumped to any number. When sorting the queue to evaluate which task should be executed next, we use the ``priority_weight``, summed up with all of the ``priority_weight`` values from tasks downstream from this task; the highest summed value wins. Thus, you can bump a specific important task, and the whole path to that task gets prioritized accordingly.
 
-Tasks will be scheduled as usual while the slots fill up. Once capacity is reached, runnable tasks get queued and their state will show as such in the UI. As slots free up, queued tasks start running based on the ``priority_weight`` (of the task and its descendants).
+Tasks will be scheduled as usual while the slots fill up. The number of slots occupied by a task can be configured by
+``pool_slots``. Once capacity is reached, runnable tasks get queued and their state will show as such in the UI.
+As slots free up, queued tasks start running based on the :ref:`concepts:priority-weight` of the task and its
+descendants.
 
-Note that if tasks are not given a pool, they are assigned to a default pool ``default_pool``.  ``default_pool`` is initialized with 128 slots and can be modified through the UI or CLI (but cannot be removed).
+Note that if tasks are not given a pool, they are assigned to a default pool ``default_pool``.  ``default_pool`` is
+initialized with 128 slots and can be modified through the UI or CLI (but cannot be removed).
 
 .. warning::
 
-    Pools and SubDAGs do not interact as you might first expect. SubDAGs will *not* honor any pool you set on them at the top level; pools must be set on the tasks *inside* the SubDAG directly.
+    Pools and SubDAGs do not interact as you might first expect. SubDAGs will *not* honor any pool you set on them at
+    the top level; pools must be set on the tasks *inside* the SubDAG directly.
diff --git a/docs/apache-airflow/concepts/priority-weight.rst b/docs/apache-airflow/concepts/priority-weight.rst
new file mode 100644
index 0000000..b8b1775
--- /dev/null
+++ b/docs/apache-airflow/concepts/priority-weight.rst
@@ -0,0 +1,61 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+.. _concepts:priority-weight:
+
+Priority Weights
+================
+
+``priority_weight`` defines priorities in the executor queue. The default ``priority_weight`` is ``1``, and can be
+bumped to any integer. Moreover, each task has a true ``priority_weight`` that is calculated based on its
+``weight_rule`` which defines weighting method used for the effective total priority weight of the task.
+
+By default, Airflow's weighting method is ``downstream``. You can find other weighting methods in
+:class:`airflow.utils.WeightRule`.
+
+There are three weighting methods.
+
+- downstream
+
+  The effective weight of the task is the aggregate sum of all
+  downstream descendants. As a result, upstream tasks will have
+  higher weight and will be scheduled more aggressively when
+  using positive weight values. This is useful when you have
+  multiple dag run instances and desire to have all upstream
+  tasks to complete for all runs before each dag can continue
+  processing downstream tasks.
+
+- upstream
+
+  The effective weight is the aggregate sum of all upstream ancestors.
+  This is the opposite where downstream tasks have higher weight
+  and will be scheduled more aggressively when using positive weight
+  values. This is useful when you have multiple dag run instances
+  and prefer to have each dag complete before starting upstream
+  tasks of other dags runs.
+
+- absolute
+
+  The effective weight is the exact ``priority_weight`` specified
+  without additional weighting. You may want to do this when you
+  know exactly what priority weight each task should have.
+  Additionally, when set to ``absolute``, there is bonus effect of
+  significantly speeding up the task creation process as for very
+  large DAGs
+
+
+The ``priority_weight`` parameter can be used in conjunction with :ref:`concepts:pool`.
diff --git a/docs/apache-airflow/concepts/scheduler.rst b/docs/apache-airflow/concepts/scheduler.rst
index 9febac3..6ea5ff2 100644
--- a/docs/apache-airflow/concepts/scheduler.rst
+++ b/docs/apache-airflow/concepts/scheduler.rst
@@ -15,6 +15,8 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _scheduler:
+
 Scheduler
 ==========
 
diff --git a/docs/apache-airflow/executor/celery.rst b/docs/apache-airflow/executor/celery.rst
index 154077e..a437245 100644
--- a/docs/apache-airflow/executor/celery.rst
+++ b/docs/apache-airflow/executor/celery.rst
@@ -195,6 +195,8 @@ During this process, two 2 process are created:
 | [11] **WorkerProcess** saves status information in **ResultBackend**.
 | [13] When **SchedulerProcess** asks **ResultBackend** again about the status, it will get information about the status of the task.
 
+.. _executor:CeleryExecutor:queue:
+
 Queues
 ------
 
diff --git a/docs/apache-airflow/faq.rst b/docs/apache-airflow/faq.rst
index b6bb2cf..5122312 100644
--- a/docs/apache-airflow/faq.rst
+++ b/docs/apache-airflow/faq.rst
@@ -15,16 +15,18 @@
     specific language governing permissions and limitations
     under the License.
 
-
+.. _faq:
 
 FAQ
 ========
 
-Why isn't my task getting scheduled?
-------------------------------------
+Scheduling
+^^^^^^^^^^
+
+Why is task not getting scheduled?
+----------------------------------
 
-There are very many reasons why your task might not be getting scheduled.
-Here are some of the common causes:
+There are very many reasons why your task might not be getting scheduled. Here are some of the common causes:
 
 - Does your script "compile", can the Airflow engine parse it and find your
   DAG object? To test this, you can run ``airflow dags list`` and
@@ -78,8 +80,38 @@ Here are some of the common causes:
 - Is the ``max_active_runs`` parameter of your DAG reached? ``max_active_runs`` defines
   how many ``running`` concurrent instances of a DAG there are allowed to be.
 
-You may also want to read the Scheduler section of the docs and make
-sure you fully understand how it proceeds.
+You may also want to read about the :ref:`scheduler` and make
+sure you fully understand how the scheduler cycle.
+
+
+How to improve DAG performance?
+-------------------------------
+
+There are some Airflow configuration to allow for a larger scheduling capacity and frequency:
+
+- :ref:`config:core__parallelism`
+- :ref:`config:core__dag_concurrency`
+- :ref:`config:core__max_active_runs_per_dag`
+
+DAGs have configurations that improves efficiency:
+
+- ``concurrency``: Overrides :ref:`config:core__dag_concurrency`.
+- ``max_active_runs``: Overrides :ref:`config:core__max_active_runs_per_dag`.
+
+Operators or tasks also have configurations that improves efficiency and scheduling priority:
+
+- ``task_concurrency``: This parameter controls the number of concurrent running task instances across ``dag_runs``
+  per task.
+- ``pool``: See :ref:`concepts:pool`.
+- ``priority_weight``: See :ref:`concepts:priority-weight`.
+- ``queue``: See :ref:`executor:CeleryExecutor:queue` for CeleryExecutor deployments only.
+
+
+How to reduce DAG scheduling latency / task delay?
+--------------------------------------------------
+
+Airflow 2.0 has low DAG scheduling latency out of the box (particularly when compared with Airflow 1.10.x),
+however if you need more throughput you can :ref:`start multiple schedulers<scheduler:ha>`.
 
 
 How do I trigger tasks based on another task's failure?
@@ -87,6 +119,9 @@ How do I trigger tasks based on another task's failure?
 
 You can achieve this with :ref:`concepts:trigger-rules`.
 
+DAG construction
+^^^^^^^^^^^^^^^^
+
 What's the deal with ``start_date``?
 ------------------------------------
 
@@ -131,9 +166,30 @@ backfill CLI command, gets overridden by the backfill's ``start_date`` commands.
 This allows for a backfill on tasks that have ``depends_on_past=True`` to
 actually start. If this were not the case, the backfill just would not start.
 
-How can I create DAGs dynamically?
+
+What does ``execution_date`` mean?
 ----------------------------------
 
+Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if you want to
+summarize data for 2016-02-19, You would do it at 2016-02-20 midnight UTC, which would be right after all data for
+2016-02-19 becomes available.
+
+This datetime value is available to you as :ref:`Macros<macros:default_variables>` as various forms in Jinja templated
+fields. They are also included in the context dictionary given to an Operator's execute function.
+
+.. code-block:: python
+
+        class MyOperator(BaseOperator):
+
+            def execute(self, context):
+                logging.info(context['execution_date'])
+
+Note that ``ds`` refers to date_string, not date start as may be confusing to some.
+
+
+How to create DAGs dynamically?
+-------------------------------
+
 Airflow looks in your ``DAGS_FOLDER`` for modules that contain ``DAG`` objects
 in their global namespace and adds the objects it finds in the
 ``DagBag``. Knowing this, all we need is a way to dynamically assign
@@ -159,68 +215,142 @@ simple dictionary.
         other_dag_id = f'bar_{i}'
         globals()[other_dag_id] = create_dag(other_dag_id)
 
-What are all the ``airflow tasks run`` commands in my process list?
--------------------------------------------------------------------
+Even though Airflow supports multiple DAG definition per python file, dynamically generated or otherwise, it is not
+recommended as Airflow would like better isolation between DAGs from a fault and deployment perspective and multiple
+DAGs in the same file goes against that.
 
-There are many layers of ``airflow tasks run`` commands, meaning it can call itself.
 
-- Basic ``airflow tasks run``: fires up an executor, and tell it to run an
-  ``airflow tasks run --local`` command. If using Celery, this means it puts a
-  command in the queue for it to run remotely on the worker. If using
-  LocalExecutor, that translates into running it in a subprocess pool.
-- Local ``airflow tasks run --local``: starts an ``airflow tasks run --raw``
-  command (described below) as a subprocess and is in charge of
-  emitting heartbeats, listening for external kill signals
-  and ensures some cleanup takes place if the subprocess fails.
-- Raw ``airflow tasks run --raw`` runs the actual operator's execute method and
-  performs the actual work.
+Are top level Python code allowed?
+----------------------------------
 
+While it is not recommended to write any code outside of defining Airflow constructs, Airflow does support any
+arbitrary python code as long as it does not break the DAG file processor or prolong file processing time past the
+:ref:`config:core__dagbag_import_timeout` value.
 
-How can my airflow dag run faster?
-----------------------------------
+A common example is the violation of the time limit when building a dynamic DAG which usually requires querying data
+from another service like a database. At the same time, the requested service is being swamped with DAG file
+processors requests for data to process the file. These unintended interactions may cause the service to deteriorate
+and eventually cause DAG file processing to fail.
 
-There are a few variables we can control to improve airflow dag performance:
+Refer to :ref:`DAG writing best practices<best_practice:writing_a_dag>` for more information.
 
-- ``parallelism``: This variable controls the number of task instances that runs simultaneously across the whole Airflow cluster. User could increase the ``parallelism`` variable in the ``airflow.cfg``.
-- ``concurrency``: The Airflow scheduler will run no more than ``concurrency`` task instances for your DAG at any given time. Concurrency is defined in your Airflow DAG. If you do not set the concurrency on your DAG, the scheduler will use the default value from the ``dag_concurrency`` entry in your ``airflow.cfg``.
-- ``task_concurrency``: This variable controls the number of concurrent running task instances across ``dag_runs`` per task.
-- ``max_active_runs``: the Airflow scheduler will run no more than ``max_active_runs`` DagRuns of your DAG at a given time. If you do not set the ``max_active_runs`` in your DAG, the scheduler will use the default value from the ``max_active_runs_per_dag`` entry in your ``airflow.cfg``.
-- ``pool``: This variable controls the number of concurrent running task instances assigned to the pool.
 
-How can we reduce the airflow UI page load time?
-------------------------------------------------
+Do Macros resolves in another Jinja template?
+---------------------------------------------
 
-If your dag takes long time to load, you could reduce the value of ``default_dag_run_display_number`` configuration in ``airflow.cfg`` to a smaller value. This configurable controls the number of dag run to show in UI with default value 25.
+It is not possible to render :ref:`Macros<macros>` or any Jinja template within another Jinja template. This is
+commonly attempted in ``user_defined_macros``.
 
+.. code-block:: python
 
-How to fix Exception: Global variable explicit_defaults_for_timestamp needs to be on (1)?
------------------------------------------------------------------------------------------
+        dag = DAG(
+            ...
+            user_defined_macros={
+                'my_custom_macro': 'day={{ ds }}'
+            }
+        )
 
-This means ``explicit_defaults_for_timestamp`` is disabled in your mysql server and you need to enable it by:
+        bo = BashOperator(
+            task_id='my_task',
+            bash_command="echo {{ my_custom_macro }}",
+            dag=dag
+        )
 
-#. Set ``explicit_defaults_for_timestamp = 1`` under the ``mysqld`` section in your ``my.cnf`` file.
-#. Restart the Mysql server.
+This will echo "day={{ ds }}" instead of "day=2020-01-01" for a dagrun with the execution date 2020-01-01 00:00:00.
 
+.. code-block:: python
 
-How to reduce airflow dag scheduling latency in production?
------------------------------------------------------------
+        bo = BashOperator(
+            task_id='my_task',
+            bash_command="echo day={{ ds }}",
+            dag=dag
+        )
+
+By using the ds macros directly in the template_field, the rendered value results in "day=2020-01-01".
 
-Airflow 2 has low DAG scheduling latency out of the box (particularly when compared with Airflow 1.10.x),
-however if you need more throughput you can :ref:`start multiple schedulers<scheduler:ha>`.
 
-Why next_ds or prev_ds might not contain expected values?
----------------------------------------------------------
+Why ``next_ds`` or ``prev_ds`` might not contain expected values?
+------------------------------------------------------------------
 
 - When scheduling DAG, the ``next_ds`` ``next_ds_nodash`` ``prev_ds`` ``prev_ds_nodash`` are calculated using
   ``execution_date`` and ``schedule_interval``. If you set ``schedule_interval`` as ``None`` or ``@once``,
   the ``next_ds``, ``next_ds_nodash``, ``prev_ds``, ``prev_ds_nodash`` values will be set to ``None``.
 - When manually triggering DAG, the schedule will be ignored, and ``prev_ds == next_ds == ds``
 
+
+Task execution interactions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+What does ``TemplateNotFound`` mean?
+-------------------------------------
+
+``TemplateNotFound`` errors are usually due to misalignment with user expectations when passing path to operator
+that trigger Jinja templating. A common occurrence is with :ref:`BashOperators<howto/operator:BashOperator>`.
+
+Another commonly missed fact is that the files are resolved relative to where the pipeline file lives. You can add
+other directories to the ``template_searchpath`` of the DAG object to allow for other non-relative location.
+
+
+How to trigger tasks based on another task's failure?
+-----------------------------------------------------
+
+For tasks that are related through dependency, you can set the ``trigger_rule`` to ``TriggerRule.ALL_FAILED`` if the
+task execution depends on the failure of ALL its upstream tasks or ``TriggerRule.ONE_FAILED`` for just one of the
+upstream task.
+
+.. code-block:: python
+
+    from airflow.decorators import dag, task
+    from airflow.exceptions import AirflowException
+    from airflow.utils.trigger_rule import TriggerRule
+
+    from datetime import datetime
+
+
+    @task
+    def a_func():
+        raise AirflowException
+
+
+    @task(
+        trigger_rule=TriggerRule.ALL_FAILED,
+    )
+    def b_func():
+        pass
+
+    @dag(
+        schedule_interval='@once',
+        start_date=datetime(2021, 1, 1)
+    )
+    def my_dag():
+        a = a_func()
+        b = b_func()
+
+        a >> b
+
+    dag = my_dag()
+
+See :ref:`concepts:trigger-rules` for more information.
+
+If the tasks are not related by dependency, you will need to :ref:`build a custom Operator<custom_operator>`.
+
+Airflow UI
+^^^^^^^^^^
+
 How do I stop the sync perms happening multiple times per webserver?
 --------------------------------------------------------------------
 
 Set the value of ``update_fab_perms`` configuration in ``airflow.cfg`` to ``False``.
 
+
+How to reduce the airflow UI page load time?
+------------------------------------------------
+
+If your dag takes long time to load, you could reduce the value of ``default_dag_run_display_number`` configuration
+in ``airflow.cfg`` to a smaller value. This configurable controls the number of dag run to show in UI with default
+value ``25``.
+
+
 Why did the pause dag toggle turn red?
 --------------------------------------
 
@@ -228,3 +358,41 @@ If pausing or unpausing a dag fails for any reason, the dag toggle will
 revert to its previous state and turn red. If you observe this behavior,
 try pausing the dag again, or check the console or server logs if the
 issue recurs.
+
+
+MySQL and MySQL variant Databases
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+What does "MySQL Server has gone away" mean?
+--------------------------------------------
+
+You may occasionally experience ``OperationalError`` with the message "MySQL Server has gone away". This is due to the
+connection pool keeping connections open too long and you are given an old connection that has expired. To ensure a
+valid connection, you can set :ref:`config:core__sql_alchemy_pool_recycle` to ensure connections are invalidated after
+that many seconds and new ones are created.
+
+
+Does Airflow support extended ASCII or unicode characters?
+----------------------------------------------------------
+
+If you intend to use extended ASCII or Unicode characters in Airflow, you have to provide a proper connection string to
+the MySQL database since they define charset explicitly.
+
+.. code-block:: text
+
+    sql_alchemy_conn = mysql://airflow@localhost:3306/airflow?charset=utf8
+
+You will experience ``UnicodeDecodeError`` thrown by ``WTForms`` templating and other Airflow modules like below.
+
+.. code-block:: text
+
+   'ascii' codec can't decode byte 0xae in position 506: ordinal not in range(128)
+
+
+How to fix Exception: Global variable ``explicit_defaults_for_timestamp`` needs to be on (1)?
+---------------------------------------------------------------------------------------------
+
+This means ``explicit_defaults_for_timestamp`` is disabled in your mysql server and you need to enable it by:
+
+#. Set ``explicit_defaults_for_timestamp = 1`` under the ``mysqld`` section in your ``my.cnf`` file.
+#. Restart the Mysql server.
diff --git a/docs/apache-airflow/howto/custom-operator.rst b/docs/apache-airflow/howto/custom-operator.rst
index e0ad193..648ef73 100644
--- a/docs/apache-airflow/howto/custom-operator.rst
+++ b/docs/apache-airflow/howto/custom-operator.rst
@@ -15,6 +15,7 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _custom_operator:
 
 Creating a custom Operator
 ==========================
@@ -33,6 +34,12 @@ There are two methods that you need to override in a derived class:
 * Execute - The code to execute when the runner calls the operator. The method contains the
   airflow context as a parameter that can be used to read config values.
 
+.. note::
+
+    When implementing custom operators, do not make any expensive operations in the ``__init__`` method. The operators
+    will instantiated once per scheduler cycle per task using them, and making database calls can significantly slow
+    down scheduling and waste resources.
+
 Let's implement an example ``HelloOperator`` in a new file ``hello_operator.py``:
 
 .. code-block:: python
diff --git a/docs/apache-airflow/macros-ref.rst b/docs/apache-airflow/macros-ref.rst
index 8fb4cf6..4920785 100644
--- a/docs/apache-airflow/macros-ref.rst
+++ b/docs/apache-airflow/macros-ref.rst
@@ -25,6 +25,8 @@ Variables and macros can be used in templates (see the :ref:`concepts:jinja-temp
 The following come for free out of the box with Airflow.
 Additional custom macros can be added globally through :doc:`plugins`, or at a DAG level through the ``DAG.user_defined_macros`` argument.
 
+.. _macros:default_variables:
+
 Default Variables
 -----------------
 The Airflow engine passes a few variables by default that are accessible