You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Ash Berlin-Taylor <as...@apache.org> on 2022/09/08 13:59:58 UTC

Apache Airflow 2.4.0b1 available for testing

Hello everyone!

I'm very excited to let you all know that I have just push 2.4.0b1 to 
PyPi (docker images coming soon) and is now ready for testing. All 
going well we will have an RC1 next week.

The headline user-facing feature is AIP-48: Data-aware scheduling 
<http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow/latest/concepts/datasets.html> 
which lets you schedule DAGs based on datasets being updated by other 
dags and is a huge foundational feature to Airflow that we will be 
expanding on over the coming releases:

from airflow import Dataset

with DAG(...):
    MyOperator(
        # this task updates example.csv
        outlets=[Dataset("s3://dataset-bucket/example.csv")],
        ...,
    )


with DAG(
    # this DAG should be run when example.csv is updated (by dag1)
    schedule=[Dataset("s3://dataset-bucket/example.csv")],
    ...,
):
    ...

This also includes the final bits of AIP-43 
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation>

We have not yet pulled together a full release notes, but here are the 
few bits that had newsfragments as we went along (check out 
<http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow/latest/release_notes.html> 
in a little while once the build has updated):

- The DB related classes: ``DBApiHook``, ``SQLSensor`` have been moved 
to ``apache-airflow-providers-common-sql`` provider. (NEW)
- DAGS used in a context manager no longer need to be assigned to a 
module variable (#23592)

  Previously you had do assign a DAG to a module-level variable in 
order for Airflow to pick it up. For example this

  .. code-block:: python

     with DAG(dag_id="example") as dag:
         ...


     @dag
     def dag_maker():
         ...


     dag2 = dag_maker()


  can become

  .. code-block:: python

     with DAG(dag_id="example"):
         ...


     @dag
     def dag_maker():
         ...


     dag_maker()

  If you want to disable the behaviour for any reason then set 
``auto_register=False`` on the dag:

  .. code-block:: python

     # This dag will not be picked up by Airflow as it's not assigned 
to a variable
     with DAG(dag_id="example", auto_register=False):
         ...

- DAG runs sorting logic changed in grid view (#25410)

  The ordering of DAG runs in the grid view has been changed to be more 
"natural".
  The new logic generally orders by data interval, but a custom 
ordering can be
  applied by setting the DAG to use a custom timetable. (#25090)
- Deprecation of ``schedule_interval`` and ``timetable`` arguments

  We added new DAG argument ``schedule`` that can accept a cron 
expression, timedelta object, *timetable* object, or list of dataset 
objects. Arguments ``schedule_interval`` and ``timetable`` are 
deprecated.

  If you previously used the ``@daily`` cron preset, your DAG may have 
looked like this:

  .. code-block:: python

      with DAG(
          dag_id='my_example',
          start_date=datetime(2021, 1, 1),
          schedule_interval='@daily',
      ):
          ...

  Going forward, you should use the ``schedule`` argument instead:

  .. code-block:: python

      with DAG(
          dag_id='my_example',
          start_date=datetime(2021, 1, 1),
          schedule='@daily',
      ):
          ...

  The same is true if you used a custom timetable. Previously you would 
have used the ``timetable`` argument:

  .. code-block:: python

      with DAG(
          dag_id='my_example',
          start_date=datetime(2021, 1, 1),
          
timetable=EventsTimetable(event_dates=[pendulum.datetime(2022, 4, 5)]),
      ):
          ...

  Now you should use the ``schedule`` argument:

  .. code-block:: python

      with DAG(
          dag_id='my_example',
          start_date=datetime(2021, 1, 1),
          schedule=EventsTimetable(event_dates=[pendulum.datetime(2022, 
4, 5)]),
      ):
          ...

- Removal of experimental Smart Sensors (#25507)

  Smart Sensors were added in 2.0 and deprecated in favor of Deferrable 
operators in 2.2, and have now been removed.
- The ``airflow.contrib`` packages and deprecated modules from Airflow 
1.10 in ``airflow.hooks``, ``airflow.operators``, ``airflow.sensors`` 
packages, have now dynamically generated modules and while users can 
continue using the deprecated contrib classes, they are no longer 
visible for static code check tools and will be reported as missing. It 
is recommended for the users to move to non-deprecated classes. 
(#26153, #26179, #26167)


Features
^^^^^^^^

- DbApiHook accepts log_sql to turn off logging SQL queries. (#24570)


Improvements
^^^^^^^^^^^^

- Default value for [core] hostname_callable is 
``airflow.utils.net.getfqdn`` which should provide more stable 
canonical host name. You can still use ``socket.getfqdn``or any other 
``hostname_callable`` you had configured.. (#24981)


Bug Fixes
^^^^^^^^^

- ``ExternalTaskSensor`` now supports the ``soft_fail`` flag to skip if 
external task or DAG enters a failed state. (#23647)