You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "o-nikolas (via GitHub)" <gi...@apache.org> on 2023/07/17 18:27:04 UTC

[GitHub] [airflow] o-nikolas commented on a diff in pull request #32604: Allow configuration to be contributed by providers

o-nikolas commented on code in PR #32604:
URL: https://github.com/apache/airflow/pull/32604#discussion_r1265730580


##########
TESTING.rst:
##########
@@ -55,8 +55,33 @@ Follow the guidelines when writing unit tests:
   tests, so we run Pytest with ``--disable-warnings`` but instead we have ``pytest-capture-warnings`` plugin that
   overrides ``recwarn`` fixture behaviour.
 
-**NOTE:** We plan to convert all unit tests to standard "asserts" semi-automatically, but this will be done later
-in Airflow 2.0 development phase. That will include setUp/tearDown/context managers and decorators.
+
+.. note::
+
+  We are in the process of convert all unit tests to standard "asserts" and pytest fixtures
+  so if you find some tests that are still using classic setUp/tearDown approach or unittest asserts, feel
+  free to convert them to pytest.
+
+Airflow configuration for unit tests
+------------------------------------
+
+Some of the unit tests require special configuration set as "default" one. This is done automatically by
+automatically adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in Pytest auto-used

Review Comment:
   ```suggestion
   Some of the unit tests require special configuration set as the "default". This is done automatically by
    adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in Pytest auto-used
   ```



##########
TESTING.rst:
##########
@@ -55,8 +55,33 @@ Follow the guidelines when writing unit tests:
   tests, so we run Pytest with ``--disable-warnings`` but instead we have ``pytest-capture-warnings`` plugin that
   overrides ``recwarn`` fixture behaviour.
 
-**NOTE:** We plan to convert all unit tests to standard "asserts" semi-automatically, but this will be done later
-in Airflow 2.0 development phase. That will include setUp/tearDown/context managers and decorators.
+
+.. note::
+
+  We are in the process of convert all unit tests to standard "asserts" and pytest fixtures
+  so if you find some tests that are still using classic setUp/tearDown approach or unittest asserts, feel
+  free to convert them to pytest.
+
+Airflow configuration for unit tests
+------------------------------------
+
+Some of the unit tests require special configuration set as "default" one. This is done automatically by
+automatically adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in Pytest auto-used
+fixture. This in turn makes airflow configuration load test configuration from
+``airflow/config_templates/config_unit_test.yml`` - defaults from the test configuration replace the original
+defaults stored in ``airflow/config_templates/config.yml``. If you want to add some test-only configuration,
+you should copy the option definition to ``config_unit_test.yml`` and change default value there.
+
+.. note::
+
+  The test configuration for Airflow before July 2023 was automatically generated in a file named
+  ``AIRFLOW_HOME/unittest.cfg`` and template for it was stored in "config_templates" next to the yaml file,

Review Comment:
   ```suggestion
     ``AIRFLOW_HOME/unittest.cfg`` and the template for it was stored in "config_templates" next to the yaml file,
   ```



##########
TESTING.rst:
##########
@@ -55,8 +55,33 @@ Follow the guidelines when writing unit tests:
   tests, so we run Pytest with ``--disable-warnings`` but instead we have ``pytest-capture-warnings`` plugin that
   overrides ``recwarn`` fixture behaviour.
 
-**NOTE:** We plan to convert all unit tests to standard "asserts" semi-automatically, but this will be done later
-in Airflow 2.0 development phase. That will include setUp/tearDown/context managers and decorators.
+
+.. note::
+
+  We are in the process of convert all unit tests to standard "asserts" and pytest fixtures
+  so if you find some tests that are still using classic setUp/tearDown approach or unittest asserts, feel
+  free to convert them to pytest.
+
+Airflow configuration for unit tests
+------------------------------------
+
+Some of the unit tests require special configuration set as "default" one. This is done automatically by
+automatically adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in Pytest auto-used
+fixture. This in turn makes airflow configuration load test configuration from
+``airflow/config_templates/config_unit_test.yml`` - defaults from the test configuration replace the original
+defaults stored in ``airflow/config_templates/config.yml``. If you want to add some test-only configuration,
+you should copy the option definition to ``config_unit_test.yml`` and change default value there.
+
+.. note::
+
+  The test configuration for Airflow before July 2023 was automatically generated in a file named
+  ``AIRFLOW_HOME/unittest.cfg`` and template for it was stored in "config_templates" next to the yaml file,
+  however this was only done for the first time you run airflow and you had to manually maintain the file,
+  and it was pretty cumbrsome and arcane knowledge, also this file has been overwritten in Breeze environment

Review Comment:
   ```suggestion
     however this was only done for the first time you run airflow and you had to manually maintain the file,
     and it was pretty cumbersome and arcane knowledge, also this file has been overwritten in Breeze environment
   ```



##########
airflow/cli/cli_config.py:
##########
@@ -90,7 +91,7 @@ def _check_value(self, action, value):
                         f"executors derived from them, your current executor: {executor}, subclassed from: "
                         f'{", ".join([base_cls.__qualname__ for base_cls in executor_cls.__bases__])}'
                     )
-                    raise ArgumentError(action, message)
+                    warnings.warn(message)

Review Comment:
   Most of this code will leave soon anyways when I finish the CLI abstraction for Airflow CLI (see our [convo here](https://github.com/apache/airflow/pull/29055#discussion_r1082091328)).



##########
airflow/cli/cli_config.py:
##########
@@ -793,6 +814,13 @@ def string_lower_type(val):
     action="store_true",
 )
 
+# IMPORTANT NOTE!
+#
+# Celery configs below have explicit fallback values because celery provider defaults are not yet loaded
+# via provider at the time we parse the command line, so in case it is not set, we need to have manual
+# fallback
+# DO NOT REMOVE THE FALLBACKS even if you are tempted to.
+# TODO: possibly move the commands to providers but that could be big performance hit on the CLI

Review Comment:
   @potiuk 
   
   AIP-51 PR https://github.com/apache/airflow/pull/29055 will move the commands to the executor modules (which will be in providers now) so this will actually be done by that.
   
   I've spent time optimizing the import times of the executor modules so it should not slow down the CLI much at all.
   
   Was there another concern you had with regards to performance?



##########
airflow/config_templates/default_airflow.cfg:
##########
@@ -16,1507 +15,22 @@
 # specific language governing permissions and limitations
 # under the License.
 
-# This is the template for Airflow's default configuration. When Airflow is
-# imported, it looks for a configuration file at $AIRFLOW_HOME/airflow.cfg. If
-# it doesn't exist, Airflow uses this template to generate it by replacing
-# variables in curly braces with their global values from configuration.py.
-
-# Users should not modify this file; they should customize the generated
-# airflow.cfg instead.
-
-
-# ----------------------- TEMPLATE BEGINS HERE -----------------------
-
-[core]
-# The folder where your airflow pipelines live, most likely a
-# subfolder in a code repository. This path must be absolute.
-dags_folder = {AIRFLOW_HOME}/dags
-
-# Hostname by providing a path to a callable, which will resolve the hostname.
-# The format is "package.function".
 #
-# For example, default value "airflow.utils.net.getfqdn" means that result from patched
-# version of socket.getfqdn() - see https://github.com/python/cpython/issues/49254.
+# NOTE IF YOU ARE LOOKING FOR DEFAULT CONFIGURATION FILE HERE !!! LOOK NO MORE. READ NOTE BELOW!
 #
-# No argument should be required in the function specified.
-# If using IP address as hostname is preferred, use value ``airflow.utils.net.get_host_ip_address``
-hostname_callable = airflow.utils.net.getfqdn
-
-# A callable to check if a python file has airflow dags defined or not
-# with argument as: `(file_path: str, zip_file: zipfile.ZipFile | None = None)`
-# return True if it has dags otherwise False
-# If this is not provided, Airflow uses its own heuristic rules.
-might_contain_dag_callable = airflow.utils.file.might_contain_dag_via_default_heuristic
-
-# Default timezone in case supplied date times are naive
-# can be utc (default), system, or any IANA timezone string (e.g. Europe/Amsterdam)
-default_timezone = utc
-
-# The executor class that airflow should use. Choices include
-# ``SequentialExecutor``, ``LocalExecutor``, ``CeleryExecutor``, ``DaskExecutor``,
-# ``KubernetesExecutor``, ``CeleryKubernetesExecutor`` or the
-# full import path to the class when using a custom executor.
-executor = SequentialExecutor
-
-# The auth manager class that airflow should use. Full import path to the auth manager class.
-auth_manager = airflow.auth.managers.fab.fab_auth_manager.FabAuthManager
-
-# This defines the maximum number of task instances that can run concurrently per scheduler in
-# Airflow, regardless of the worker count. Generally this value, multiplied by the number of
-# schedulers in your cluster, is the maximum number of task instances with the running
-# state in the metadata database.
-parallelism = 32
-
-# The maximum number of task instances allowed to run concurrently in each DAG. To calculate
-# the number of tasks that is running concurrently for a DAG, add up the number of running
-# tasks for all DAG runs of the DAG. This is configurable at the DAG level with ``max_active_tasks``,
-# which is defaulted as ``max_active_tasks_per_dag``.
+# This file used to have something that reminded default Airflow configuration but it was really a template
+# That was used to generate it and it was confusing if you copied it to your configuration and some
+# configuration keys were wrong.
 #
-# An example scenario when this would be useful is when you want to stop a new dag with an early
-# start date from stealing all the executor slots in a cluster.
-max_active_tasks_per_dag = 16
-
-# Are DAGs paused by default at creation
-dags_are_paused_at_creation = True
-
-# The maximum number of active DAG runs per DAG. The scheduler will not create more DAG runs
-# if it reaches the limit. This is configurable at the DAG level with ``max_active_runs``,
-# which is defaulted as ``max_active_runs_per_dag``.
-max_active_runs_per_dag = 16
-
-# The name of the method used in order to start Python processes via the multiprocessing module.
-# This corresponds directly with the options available in the Python docs:
-# https://docs.python.org/3/library/multiprocessing.html#multiprocessing.set_start_method.
-# Must be one of the values returned by:
-# https://docs.python.org/3/library/multiprocessing.html#multiprocessing.get_all_start_methods.
-# Example: mp_start_method = fork
-# mp_start_method =
-
-# Whether to load the DAG examples that ship with Airflow. It's good to
-# get started, but you probably want to set this to ``False`` in a production
-# environment
-load_examples = True
-
-# Path to the folder containing Airflow plugins
-plugins_folder = {AIRFLOW_HOME}/plugins
-
-# Should tasks be executed via forking of the parent process ("False",
-# the speedier option) or by spawning a new python process ("True" slow,
-# but means plugin changes picked up by tasks straight away)
-execute_tasks_new_python_interpreter = False
-
-# Secret key to save connection passwords in the db
-fernet_key = {FERNET_KEY}
-
-# Whether to disable pickling dags
-donot_pickle = True
-
-# How long before timing out a python file import
-dagbag_import_timeout = 30.0
-
-# Should a traceback be shown in the UI for dagbag import errors,
-# instead of just the exception message
-dagbag_import_error_tracebacks = True
-
-# If tracebacks are shown, how many entries from the traceback should be shown
-dagbag_import_error_traceback_depth = 2
-
-# How long before timing out a DagFileProcessor, which processes a dag file
-dag_file_processor_timeout = 50
-
-# The class to use for running task instances in a subprocess.
-# Choices include StandardTaskRunner, CgroupTaskRunner or the full import path to the class
-# when using a custom task runner.
-task_runner = StandardTaskRunner
-
-# If set, tasks without a ``run_as_user`` argument will be run with this user
-# Can be used to de-elevate a sudo user running Airflow when executing tasks
-default_impersonation =
-
-# What security module to use (for example kerberos)
-security =
-
-# Turn unit test mode on (overwrites many configuration options with test
-# values at runtime)
-unit_test_mode = False
-
-# Whether to enable pickling for xcom (note that this is insecure and allows for
-# RCE exploits).
-enable_xcom_pickling = False
-
-# What classes can be imported during deserialization. This is a multi line value.
-# The individual items will be parsed as regexp. Python built-in classes (like dict)
-# are always allowed. Bare "." will be replaced so you can set airflow.* .
-allowed_deserialization_classes = airflow\..*
-
-# When a task is killed forcefully, this is the amount of time in seconds that
-# it has to cleanup after it is sent a SIGTERM, before it is SIGKILLED
-killed_task_cleanup_time = 60
-
-# Whether to override params with dag_run.conf. If you pass some key-value pairs
-# through ``airflow dags backfill -c`` or
-# ``airflow dags trigger -c``, the key-value pairs will override the existing ones in params.
-dag_run_conf_overrides_params = True
-
-# If enabled, Airflow will only scan files containing both ``DAG`` and ``airflow`` (case-insensitive).
-dag_discovery_safe_mode = True
-
-# The pattern syntax used in the ".airflowignore" files in the DAG directories. Valid values are
-# ``regexp`` or ``glob``.
-dag_ignore_file_syntax = regexp
-
-# The number of retries each task is going to have by default. Can be overridden at dag or task level.
-default_task_retries = 0
-
-# The number of seconds each task is going to wait by default between retries. Can be overridden at
-# dag or task level.
-default_task_retry_delay = 300
-
-# The maximum delay (in seconds) each task is going to wait by default between retries.
-# This is a global setting and cannot be overridden at task or DAG level.
-max_task_retry_delay = 86400
-
-# The weighting method used for the effective total priority weight of the task
-default_task_weight_rule = downstream
-
-# The default task execution_timeout value for the operators. Expected an integer value to
-# be passed into timedelta as seconds. If not specified, then the value is considered as None,
-# meaning that the operators are never timed out by default.
-default_task_execution_timeout =
-
-# Updating serialized DAG can not be faster than a minimum interval to reduce database write rate.
-min_serialized_dag_update_interval = 30
-
-# If True, serialized DAGs are compressed before writing to DB.
-# Note: this will disable the DAG dependencies view
-compress_serialized_dags = False
-
-# Fetching serialized DAG can not be faster than a minimum interval to reduce database
-# read rate. This config controls when your DAGs are updated in the Webserver
-min_serialized_dag_fetch_interval = 10
-
-# Maximum number of Rendered Task Instance Fields (Template Fields) per task to store
-# in the Database.
-# All the template_fields for each of Task Instance are stored in the Database.
-# Keeping this number small may cause an error when you try to view ``Rendered`` tab in
-# TaskInstance view for older tasks.
-max_num_rendered_ti_fields_per_task = 30
-
-# On each dagrun check against defined SLAs
-check_slas = True
-
-# Path to custom XCom class that will be used to store and resolve operators results
-# Example: xcom_backend = path.to.CustomXCom
-xcom_backend = airflow.models.xcom.BaseXCom
-
-# By default Airflow plugins are lazily-loaded (only loaded when required). Set it to ``False``,
-# if you want to load plugins whenever 'airflow' is invoked via cli or loaded from module.
-lazy_load_plugins = True
-
-# By default Airflow providers are lazily-discovered (discovery and imports happen only when required).
-# Set it to False, if you want to discover providers whenever 'airflow' is invoked via cli or
-# loaded from module.
-lazy_discover_providers = True
-
-# Hide sensitive Variables or Connection extra json keys from UI and task logs when set to True
-#
-# (Connection passwords are always hidden in logs)
-hide_sensitive_var_conn_fields = True
-
-# A comma-separated list of extra sensitive keywords to look for in variables names or connection's
-# extra JSON.
-sensitive_var_conn_names =
-
-# Task Slot counts for ``default_pool``. This setting would not have any effect in an existing
-# deployment where the ``default_pool`` is already created. For existing deployments, users can
-# change the number of slots using Webserver, API or the CLI
-default_pool_task_slot_count = 128
-
-# The maximum list/dict length an XCom can push to trigger task mapping. If the pushed list/dict has a
-# length exceeding this value, the task pushing the XCom will be failed automatically to prevent the
-# mapped tasks from clogging the scheduler.
-max_map_length = 1024
-
-# The default umask to use for process when run in daemon mode (scheduler, worker,  etc.)
-#
-# This controls the file-creation mode mask which determines the initial value of file permission bits
-# for newly created files.
+# Airflow will generate default configuration for you when you run it for the first time in

Review Comment:
   ```suggestion
   # Airflow will generate the default configuration for you when you run it for the first time in
   ```



##########
airflow/config_templates/default_airflow.cfg:
##########
@@ -16,1507 +15,22 @@
 # specific language governing permissions and limitations
 # under the License.
 
-# This is the template for Airflow's default configuration. When Airflow is
-# imported, it looks for a configuration file at $AIRFLOW_HOME/airflow.cfg. If
-# it doesn't exist, Airflow uses this template to generate it by replacing
-# variables in curly braces with their global values from configuration.py.
-
-# Users should not modify this file; they should customize the generated
-# airflow.cfg instead.
-
-
-# ----------------------- TEMPLATE BEGINS HERE -----------------------
-
-[core]
-# The folder where your airflow pipelines live, most likely a
-# subfolder in a code repository. This path must be absolute.
-dags_folder = {AIRFLOW_HOME}/dags
-
-# Hostname by providing a path to a callable, which will resolve the hostname.
-# The format is "package.function".
 #
-# For example, default value "airflow.utils.net.getfqdn" means that result from patched
-# version of socket.getfqdn() - see https://github.com/python/cpython/issues/49254.
+# NOTE IF YOU ARE LOOKING FOR DEFAULT CONFIGURATION FILE HERE !!! LOOK NO MORE. READ NOTE BELOW!
 #
-# No argument should be required in the function specified.
-# If using IP address as hostname is preferred, use value ``airflow.utils.net.get_host_ip_address``
-hostname_callable = airflow.utils.net.getfqdn
-
-# A callable to check if a python file has airflow dags defined or not
-# with argument as: `(file_path: str, zip_file: zipfile.ZipFile | None = None)`
-# return True if it has dags otherwise False
-# If this is not provided, Airflow uses its own heuristic rules.
-might_contain_dag_callable = airflow.utils.file.might_contain_dag_via_default_heuristic
-
-# Default timezone in case supplied date times are naive
-# can be utc (default), system, or any IANA timezone string (e.g. Europe/Amsterdam)
-default_timezone = utc
-
-# The executor class that airflow should use. Choices include
-# ``SequentialExecutor``, ``LocalExecutor``, ``CeleryExecutor``, ``DaskExecutor``,
-# ``KubernetesExecutor``, ``CeleryKubernetesExecutor`` or the
-# full import path to the class when using a custom executor.
-executor = SequentialExecutor
-
-# The auth manager class that airflow should use. Full import path to the auth manager class.
-auth_manager = airflow.auth.managers.fab.fab_auth_manager.FabAuthManager
-
-# This defines the maximum number of task instances that can run concurrently per scheduler in
-# Airflow, regardless of the worker count. Generally this value, multiplied by the number of
-# schedulers in your cluster, is the maximum number of task instances with the running
-# state in the metadata database.
-parallelism = 32
-
-# The maximum number of task instances allowed to run concurrently in each DAG. To calculate
-# the number of tasks that is running concurrently for a DAG, add up the number of running
-# tasks for all DAG runs of the DAG. This is configurable at the DAG level with ``max_active_tasks``,
-# which is defaulted as ``max_active_tasks_per_dag``.
+# This file used to have something that reminded default Airflow configuration but it was really a template
+# That was used to generate it and it was confusing if you copied it to your configuration and some
+# configuration keys were wrong.

Review Comment:
   ```suggestion
   # This file used to have something that was similar to the default Airflow configuration but it was really just a template.
   # It was used to generate the final configuration and it was confusing if you copied it to your configuration and some of 
   # keys were wrong.
   ```



##########
TESTING.rst:
##########
@@ -55,8 +55,33 @@ Follow the guidelines when writing unit tests:
   tests, so we run Pytest with ``--disable-warnings`` but instead we have ``pytest-capture-warnings`` plugin that
   overrides ``recwarn`` fixture behaviour.
 
-**NOTE:** We plan to convert all unit tests to standard "asserts" semi-automatically, but this will be done later
-in Airflow 2.0 development phase. That will include setUp/tearDown/context managers and decorators.
+
+.. note::
+
+  We are in the process of convert all unit tests to standard "asserts" and pytest fixtures
+  so if you find some tests that are still using classic setUp/tearDown approach or unittest asserts, feel
+  free to convert them to pytest.
+
+Airflow configuration for unit tests
+------------------------------------
+
+Some of the unit tests require special configuration set as "default" one. This is done automatically by
+automatically adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in Pytest auto-used
+fixture. This in turn makes airflow configuration load test configuration from
+``airflow/config_templates/config_unit_test.yml`` - defaults from the test configuration replace the original
+defaults stored in ``airflow/config_templates/config.yml``. If you want to add some test-only configuration,
+you should copy the option definition to ``config_unit_test.yml`` and change default value there.

Review Comment:
   ```suggestion
   you should copy the option definition to ``config_unit_test.yml`` and change the default value there.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org