You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/22 19:30:57 UTC
[GitHub] [airflow] o-nikolas commented on a diff in pull request #25780: Implement PythonPreexistingVirtualenvOperator

o-nikolas commented on code in PR #25780:
URL: https://github.com/apache/airflow/pull/25780#discussion_r951810315


##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use

Review Comment:
   ```suggestion
   However, it is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.

Review Comment:
   ```suggestion
   run tasks with those).
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new

Review Comment:
   ```suggestion
   the ``PythonPreexistingVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.

Review Comment:
   ```suggestion
     generally higher than when running a virtual environment task. Also, the resources used are somewhat duplicated.
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.

Review Comment:
   ```suggestion
   * Resource re-use is still OK but a little less fine grained than in the case of running task via virtual environment.
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where

Review Comment:
   ```suggestion
   * The environment used to run the tasks enjoys the optimizations and immutability of containers, where a
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in

Review Comment:
   ```suggestion
   Another strategy is to use the Docker Operator or the Kubernetes Pod Operator. Those require that Airflow runs in a
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one
+  process (running in Docker Container or Kubernetes Pod) executing the task, and supervising Airflow
+  Python task that submits the job to Docker/Kubernetes and monitors it's execution.
+* Your environment needs to have the container images prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be much longer. Appropriate deployment pipeline here
+  is a must to be able to reliably maintain your deployment.
+* Your python callable has to be serializable if you want to run it via decorators, also in this case
+  all dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* You need to understand more details about how Docker Containers or Kubernetes work. The abstraction
+  provided by those two are "leaky", so you need to understand a bit more about resources, networking,
+  containers etc. in order to Author DAG that uses those operators.
+
+
+Using multiple Docker Images and Celery Queues
+----------------------------------------------
+
+There is a possibility (though it requires a deep knowledge of Airflow deployment) to run Airflow tasks
+using multiple, independent Docker images. This can be achieved via allocating different tasks to different
+Queues and configuring your Celery workers to use different images for different Queues. This however
+(at least currently) requires a lot of manual deployment configuration and intrinsic knowledge of how
+Airflow, Celery and Kubernetes works. Also it introduce quite some overhead for running the tasks - there
+are less chances for resource reuse and it's much more difficult to fine-tune such a deployment for
+cost of resources without impacting the performance and stability.
+
+One of the possible ways to make it more useful is
+`AIP-46 Runtime isolation for Airflow tasks and DAG parsing <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Runtime+isolation+for+airflow+tasks+and+dag+parsing>`_.
+and completion of `AIP-43 DAG Processor Separation <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation>`_
+Until those are implemented, there are very little benefits of using this approach and it is not recommended.
+
+When those AIPs are implemented, however, this will open up the possibility of more multi-tenant approach,
+where multiple teams will be able to have completely isolated set of dependencies that will be used across
+all the lifecycle of DAG execution - from parsing to execution.

Review Comment:
   ```suggestion
   the full lifecycle of a DAG - from parsing to execution.
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one
+  process (running in Docker Container or Kubernetes Pod) executing the task, and supervising Airflow
+  Python task that submits the job to Docker/Kubernetes and monitors it's execution.
+* Your environment needs to have the container images prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be much longer. Appropriate deployment pipeline here
+  is a must to be able to reliably maintain your deployment.
+* Your python callable has to be serializable if you want to run it via decorators, also in this case
+  all dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* You need to understand more details about how Docker Containers or Kubernetes work. The abstraction
+  provided by those two are "leaky", so you need to understand a bit more about resources, networking,
+  containers etc. in order to Author DAG that uses those operators.
+
+
+Using multiple Docker Images and Celery Queues
+----------------------------------------------
+
+There is a possibility (though it requires a deep knowledge of Airflow deployment) to run Airflow tasks
+using multiple, independent Docker images. This can be achieved via allocating different tasks to different
+Queues and configuring your Celery workers to use different images for different Queues. This however
+(at least currently) requires a lot of manual deployment configuration and intrinsic knowledge of how
+Airflow, Celery and Kubernetes works. Also it introduce quite some overhead for running the tasks - there
+are less chances for resource reuse and it's much more difficult to fine-tune such a deployment for
+cost of resources without impacting the performance and stability.
+
+One of the possible ways to make it more useful is
+`AIP-46 Runtime isolation for Airflow tasks and DAG parsing <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Runtime+isolation+for+airflow+tasks+and+dag+parsing>`_.
+and completion of `AIP-43 DAG Processor Separation <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation>`_
+Until those are implemented, there are very little benefits of using this approach and it is not recommended.

Review Comment:
   ```suggestion
   Until those are implemented, there are very few benefits of using this approach and it is not recommended.
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+

Review Comment:
   Very fantastic and detailed doc Jarek! :tada: 
   
   If read through entirely it should give folks who are new to Airflow a good summary of the different task execution options but also they will learn more about how the Airflow execution environment itself works (where dependencies are sourced from, how tasks can conflict on shared execution hosts, etc).
   
   It looks like a lot of comments below, but there was a lot of text to read through :) Most of the comments are just grammar/naming.



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one
+  process (running in Docker Container or Kubernetes Pod) executing the task, and supervising Airflow
+  Python task that submits the job to Docker/Kubernetes and monitors it's execution.
+* Your environment needs to have the container images prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be much longer. Appropriate deployment pipeline here
+  is a must to be able to reliably maintain your deployment.
+* Your python callable has to be serializable if you want to run it via decorators, also in this case
+  all dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* You need to understand more details about how Docker Containers or Kubernetes work. The abstraction
+  provided by those two are "leaky", so you need to understand a bit more about resources, networking,
+  containers etc. in order to Author DAG that uses those operators.

Review Comment:
   ```suggestion
     containers etc. in order to author a DAG that uses those operators.
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``

Review Comment:
   ```suggestion
   a very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one
+  process (running in Docker Container or Kubernetes Pod) executing the task, and supervising Airflow
+  Python task that submits the job to Docker/Kubernetes and monitors it's execution.
+* Your environment needs to have the container images prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be much longer. Appropriate deployment pipeline here
+  is a must to be able to reliably maintain your deployment.
+* Your python callable has to be serializable if you want to run it via decorators, also in this case
+  all dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* You need to understand more details about how Docker Containers or Kubernetes work. The abstraction
+  provided by those two are "leaky", so you need to understand a bit more about resources, networking,
+  containers etc. in order to Author DAG that uses those operators.
+
+
+Using multiple Docker Images and Celery Queues
+----------------------------------------------
+
+There is a possibility (though it requires a deep knowledge of Airflow deployment) to run Airflow tasks
+using multiple, independent Docker images. This can be achieved via allocating different tasks to different
+Queues and configuring your Celery workers to use different images for different Queues. This however
+(at least currently) requires a lot of manual deployment configuration and intrinsic knowledge of how
+Airflow, Celery and Kubernetes works. Also it introduce quite some overhead for running the tasks - there
+are less chances for resource reuse and it's much more difficult to fine-tune such a deployment for
+cost of resources without impacting the performance and stability.
+
+One of the possible ways to make it more useful is
+`AIP-46 Runtime isolation for Airflow tasks and DAG parsing <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Runtime+isolation+for+airflow+tasks+and+dag+parsing>`_.
+and completion of `AIP-43 DAG Processor Separation <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation>`_
+Until those are implemented, there are very little benefits of using this approach and it is not recommended.
+
+When those AIPs are implemented, however, this will open up the possibility of more multi-tenant approach,

Review Comment:
   ```suggestion
   When those AIPs are implemented, however, this will open up the possibility of a more multi-tenant approach,
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one
+  process (running in Docker Container or Kubernetes Pod) executing the task, and supervising Airflow
+  Python task that submits the job to Docker/Kubernetes and monitors it's execution.
+* Your environment needs to have the container images prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be much longer. Appropriate deployment pipeline here

Review Comment:
   ```suggestion
     and iteration time when you work on new versions might be much longer. An appropriate deployment pipeline here
   ```
   
   Might want to expand on what is "appropriate' here.



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one
+  process (running in Docker Container or Kubernetes Pod) executing the task, and supervising Airflow
+  Python task that submits the job to Docker/Kubernetes and monitors it's execution.
+* Your environment needs to have the container images prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be much longer. Appropriate deployment pipeline here
+  is a must to be able to reliably maintain your deployment.
+* Your python callable has to be serializable if you want to run it via decorators, also in this case
+  all dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* You need to understand more details about how Docker Containers or Kubernetes work. The abstraction
+  provided by those two are "leaky", so you need to understand a bit more about resources, networking,
+  containers etc. in order to Author DAG that uses those operators.
+
+
+Using multiple Docker Images and Celery Queues
+----------------------------------------------
+
+There is a possibility (though it requires a deep knowledge of Airflow deployment) to run Airflow tasks
+using multiple, independent Docker images. This can be achieved via allocating different tasks to different
+Queues and configuring your Celery workers to use different images for different Queues. This however
+(at least currently) requires a lot of manual deployment configuration and intrinsic knowledge of how
+Airflow, Celery and Kubernetes works. Also it introduce quite some overhead for running the tasks - there

Review Comment:
   ```suggestion
   Airflow, Celery and Kubernetes works. Also it introduces quite some overhead for running the tasks - there
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while

Review Comment:
   ```suggestion
   ``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators) while
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one
+  process (running in Docker Container or Kubernetes Pod) executing the task, and supervising Airflow
+  Python task that submits the job to Docker/Kubernetes and monitors it's execution.
+* Your environment needs to have the container images prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment

Review Comment:
   ```suggestion
     cannot change it on the fly, adding new or changing requirements requires at least an Airflow re-deployment
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one

Review Comment:
   ```suggestion
     In case of both Docker and Kubernetes operator, running tasks requires at least two processes - one
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one
+  process (running in Docker Container or Kubernetes Pod) executing the task, and supervising Airflow

Review Comment:
   ```suggestion
     process (running in Docker Container or Kubernetes Pod) executing the task, and another process supervising the Airflow
   ```



##########
docs/apache-airflow/best-practices.rst:
##########
@@ -619,3 +621,219 @@ Prune data before upgrading
 ---------------------------
 
 Some database migrations can be time-consuming.  If your metadata database is very large, consider pruning some of the old data with the :ref:`db clean<cli-db-clean>` command prior to performing the upgrade.  *Use with caution.*
+
+
+Handling Python dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Airflow has many Python dependencies and sometimes the Airflow dependencies are conflicting with dependencies that your
+task code expects. Since - by default - Airflow environment is just a single set of Python dependencies and single
+Python environment, often there might also be cases that some of your tasks require different dependencies than other tasks
+and the dependencies basically conflict between those tasks.
+
+If you are using pre-defined Airflow Operators to talk to external services, there is not much choice, but usually those
+operators will have dependencies that are not conflicting with basic Airflow dependencies. Airflow uses constraints mechanism
+which means that you have "fixed" set of dependencies that the community guarantees that Airflow can be installed with
+(including all community providers) without triggering conflicts. However you can upgrade the providers
+independently and there constraints do not limit you so chance of conflicting dependency is lower (you still have
+to test those dependencies). Therefore when you are using pre-defined operators, chance is that you will have
+little, to no problems with conflicting dependencies.
+
+However, when you are approaching Airflow in a more "modern way", where you use TaskFlow Api and most of
+your operators is written using custom python code, or when you want to write your own Custom Operator,
+you might get to the point where dependencies required by the custom code of yours are conflicting with those
+of Airflow, or even that dependencies of several of your Custom Operators introduce conflicts between themselves.
+
+There are a number of strategies that can be employed to mitigate the problem. And while dealing with
+dependency conflict in custom operators is difficult, it's actually quite a bit easier when it comes to
+Task-Flow approach or (equivalently) using ``PythonVirtualenvOperator`` or
+``PreexistingPythonVirtualenvOperator``.
+
+Let's start from the strategies that are easiest to implement (having some limits and overhead), and
+we will gradually go through those strategies that requires some changes in your Airflow deployment.
+
+Using PythonVirtualenvOperator
+------------------------------
+
+This is simplest to use and most limited strategy. The PythonVirtualenvOperator allows you to dynamically
+create virtualenv that your Python callable function will execute in. In the modern
+TaskFlow approach described in :doc:`/tutorial_taskflow_api`. this also can be done with decorating
+your callable with ``@task.virtualenv`` decorator (recommended way of using the operator).
+Each :class:`airflow.operators.python.PythonVirtualenvOperator` task can
+have it's own independent Python virtualenv and can specify fine-grained set of requirements that need
+to be installed for that task to execute.
+
+The operator takes care about:
+
+* creating the virtualenv based on your environment
+* serializing your Python callable and passing it to execution by the virtualenv Python interpreter
+* executing it and retrieving the result of the callable and pushing it via xcom if specified
+
+The benefits of the operator are:
+
+* There is no need to prepare the venv upfront. It will be dynamically created before task is run, and
+  removed after it is finished, so there is nothing special (except having virtualenv package in your
+  airflow dependencies) to make use of multiple virtual environments
+* You can run tasks with different sets of dependencies on the same workers - thus Memory resources are
+  reused (though see below about the CPU overhead involved in creating the venvs).
+* In bigger installations, DAG Authors do not need to ask anyone to create the venvs for you.
+  As DAG Author, you only have to have virtualenv dependency installed and you can specify and modify the
+  environments as you see fit.
+* No changes in deployment requirements - whether you use Local virtualenv, or Docker, or Kubernetes,
+  the tasks will work without adding anything to your deployment.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+There are certain limitations and overhead introduced by the operator:
+
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments.
+* The operator adds a CPU, networking and elapsed time overhead for running each task - Airflow has
+  to re-create the virtualenv from scratch for each task
+* The workers need to have access to PyPI or private repositories to install dependencies
+* The dynamic creation of virtualenv is prone to transient failures (for example when your repo is not available
+  or when there is a networking issue with reaching the repository
+* It's easy to  fall into a "too" dynamic environment - since the dependencies you install might get upgraded
+  and their transitive dependencies might get independent upgrades you might end up with the situation where
+  your task will stop working because someone released a new version of a dependency or you might fall
+  a victim of "supply chain" attack where new version of a dependency might become malicious
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+
+Using PreexistingPythonVirtualenvOperator
+-----------------------------------------
+
+.. versionadded:: 2.4
+
+A bit more complex but with significantly less overhead, security, stability problems is to use the
+:class:`airflow.operators.python.PreexistingPythonVirtualenvOperator``, or even better - decorating your callable with
+``@task.preexisting_virtualenv`` decorator. It requires however that the virtualenv you use is immutable
+by the task and prepared upfront in your environment (and available in all the workers in case your
+Airflow runs in a distributed environments). This way you avoid the overhead and problems of re-creating the
+virtual environment but they have to be prepared and deployed together with Airflow installation, so usually people
+who manage Airflow installation need to be involved (and in bigger installation those are usually different
+people than DAG Authors (DevOps/System Admins).
+
+Those virtual environments can be prepared in various ways - if you use LocalExecutor they just need to be installed
+at the machine where scheduler is run, if you are using distributed Celery virtualenv installations, there
+should be a pipeline that installs those virtual environments across multiple machines, finally if you are using
+Docker Image (for example via Kubernetes), the virtualenv creation should be added to the pipeline of
+your custom image building.
+
+The benefits of the operator are:
+
+* No setup overhead when running the task. The virtualenv is ready when you start running a task.
+* You can run tasks with different sets of dependencies on the same workers - thus all resources are reused.
+* There is no need to have access by workers to PyPI or private repositories. Less chance for transient
+  errors resulting from networking.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Limited impact on your deployment - you do not need to switch to Docker containers or Kubernetes to
+  make a good use of the operator.
+* No need to learn more about containers, Kubernetes as DAG Author. Only knowledge of Python, requirements
+  is required to author DAGs this way.
+
+The drawbacks:
+
+* Your environment needs to have the virtual environments prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be longer.
+* Your python callable has to be serializable. There are a number of python objects that are not serializable
+  using standard ``pickle`` library. You can mitigate some of those limitations by using ``dill`` library
+  but even that library does not solve all the serialization limitations.
+* All dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* The virtual environments are run in the same operating system, so they cannot have conflicting system-level
+  dependencies (``apt`` or ``yum`` installable packages). Only Python dependencies can be independently
+  installed in those environments
+* The tasks are only isolated from each other via running in different environments. This makes it possible
+  that running tasks will still interfere with each other - for example subsequent tasks executed on the
+  same worker might be affected by previous tasks creating/modifying files et.c
+
+Actually, you can think about the ``PythonVirtualenvOperator`` and ``PreexistingPythonVirtualenvOperator``
+as counterparts - as DAG author you'd normally iterate with dependencies and develop your DAG using
+``PythonVirtualenvOperator`` (thus decorating your tasks with ``@task.virtualenv`` decorators, while
+after the iteration and changes you would likely want to change it for production to switch to
+the ``PreexistingPythonVirtualenvOperator`` after your DevOps/System Admin teams deploy your new
+virtualenv to production. The nice thing about this is that you can switch the decorator back
+at any time and continue developing it "dynamically" with ``PythonVirtualenvOperator``.
+
+
+Using DockerOperator or Kubernetes Pod Operator
+-----------------------------------------------
+
+Another strategy is to use Docker Operator or Kubernetes Pod Operator. Those require that Airflow runs in
+Docker container environment or Kubernetes environment (or at the very least have access to create and
+run tasks with those.
+
+Similarly as in case of Python operators, the taskflow decorators are handy for you if you would like to
+use those operators to execute your callable Python code.
+
+It is far more involved - you need to understand how Docker/Kubernetes Pods work if you want to use
+this approach, but the tasks are fully isolated from each other and you are not even limited to running
+Python code. You can write your tasks in any Programming language you want. Also your dependencies are
+fully independent from Airflow ones (including the system level dependencies) so if your task require
+very different environment, this is the way to go. Those are ``@task.docker`` and ``@task.kubernetes``
+decorators.
+
+The benefits of those operators are:
+
+* You can run tasks with different sets of both Python and system level dependencies, or even tasks
+  written in completely different language or even different processor architecture (x86 vs. arm).
+* The environment used to run the tasks enjoys the optimizations and immutability of containers, where
+  similar set of dependencies can effectively reuse a number of cached layers of the image, so the
+  environment is optimized for the case where you have multiple similar, but different environments.
+* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
+  be added dynamically. This is good for both, security and stability.
+* Complete isolation between tasks. They cannot influence one another in other ways than using standard
+  Airflow XCom mechanisms.
+
+The drawbacks:
+
+* There is an overhead to start the tasks. Usually not as big as when creating virtual environments dynamically,
+  but still significant (especially for Kubernetes Pod Operator).
+* Resource re-use is still OK but a little less fine grained than in case of running task via virtual environment.
+  There is an overhead that each running container and Pod introduce, depending on your deployment, but it is
+  generally higher than when running virtual environment task. Also, there is somewhat duplication of resources used.
+  In case of both Docker and Kubernetes operator, running tasks requires always at least two processes - one
+  process (running in Docker Container or Kubernetes Pod) executing the task, and supervising Airflow
+  Python task that submits the job to Docker/Kubernetes and monitors it's execution.
+* Your environment needs to have the container images prepared upfront. This usually means that you
+  cannot change it on the flight, adding new or changing requirements require at least airflow re-deployment
+  and iteration time when you work on new versions might be much longer. Appropriate deployment pipeline here
+  is a must to be able to reliably maintain your deployment.
+* Your python callable has to be serializable if you want to run it via decorators, also in this case
+  all dependencies that are not available in Airflow environment must be locally imported in the callable you
+  use and the top-level Python code of your DAG should not import/use those libraries.
+* You need to understand more details about how Docker Containers or Kubernetes work. The abstraction
+  provided by those two are "leaky", so you need to understand a bit more about resources, networking,
+  containers etc. in order to Author DAG that uses those operators.
+
+
+Using multiple Docker Images and Celery Queues
+----------------------------------------------
+
+There is a possibility (though it requires a deep knowledge of Airflow deployment) to run Airflow tasks
+using multiple, independent Docker images. This can be achieved via allocating different tasks to different
+Queues and configuring your Celery workers to use different images for different Queues. This however
+(at least currently) requires a lot of manual deployment configuration and intrinsic knowledge of how
+Airflow, Celery and Kubernetes works. Also it introduce quite some overhead for running the tasks - there
+are less chances for resource reuse and it's much more difficult to fine-tune such a deployment for
+cost of resources without impacting the performance and stability.
+
+One of the possible ways to make it more useful is
+`AIP-46 Runtime isolation for Airflow tasks and DAG parsing <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Runtime+isolation+for+airflow+tasks+and+dag+parsing>`_.
+and completion of `AIP-43 DAG Processor Separation <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation>`_
+Until those are implemented, there are very little benefits of using this approach and it is not recommended.
+
+When those AIPs are implemented, however, this will open up the possibility of more multi-tenant approach,
+where multiple teams will be able to have completely isolated set of dependencies that will be used across

Review Comment:
   ```suggestion
   where multiple teams will be able to have completely isolated sets of dependencies that will be used across
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org