You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/16 17:01:42 UTC

[GitHub] [airflow-site] potiuk commented on a diff in pull request #659: Add posts about 2.4 release

potiuk commented on code in PR #659:
URL: https://github.com/apache/airflow-site/pull/659#discussion_r973222313


##########
landing-pages/site/content/en/blog/airflow-2.4.0/index.md:
##########
@@ -0,0 +1,145 @@
+---
+title: "Apache Airflow 2.4.0: Data
+linkTitle: "Apache Airflow 2.4.0"
+author: "Ash Berlin-Taylor"
+github: "ashberlin"
+linkedin: "ashberlin-taylor"
+description: "We're proud to announce that Apache Airflow 2.4.0 has been released."
+tags: [Release]
+date: "2022-09-19"
+---
+
+Apache Airflow
+Apache Airflow 2.4.0 contains over 650 "user-facing" commits in this release (excluding commits to providers or chart) and over 870 in total since 2.3.0 and includes 50 new features, 99 improvements, 85 bug fixes, and several doc changes.
+
+**Details**:
+
+📦 PyPI: https://pypi.org/project/apache-airflow/2.4.0/ \
+📚 Docs: https://airflow.apache.org/docs/apache-airflow/2.4.0/ \
+🛠️ Release Notes: https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html \
+🐳 Docker Image: docker pull apache/airflow:2.4.0 \
+🚏 Constraints: https://github.com/apache/airflow/tree/constraints-2.4.0
+
+## Data-aware scheduling (AIP-48)
+
+This one is big. Airflow now has the ability to schedule DAGs based on other tasks updating datasets.
+
+What does this mean, exactly? This is a great new feature and lets DAG authors create smaller, more self-contained DAGs, that can chain together into a larger data-based workflow. If you are currently using `ExternalTaskSensor` or `TriggerDagRunOperator` you should take a look at datasets -- in most cases you can replace them with something that will speed up the scheduling!
+
+But enough talking, lets have a short example. First lets write a simple DAG with a task called `my_task` that produces a dataset called `my-dataset`:
+
+```python
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='producer', ...)
+    @task(outlets=[dataset])
+    def my_task():
+        ...
+```
+
+And then we can tell Airflow to schedule a DAG whenever this Dataset changes:
+
+```python
+
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='dataset-consumer', schedule=[dataset]):
+    ...
+```
+
+With these two DAGs, the instant `my_task` finishes, Airflow will create the DAG run for the `dataset-consumer` workflow.
+
+If you have the producer and consumer DAGs in different files you do not need to use the same Dataset object, two `Dataset()`s created with the same URI are equal.
+
+We know that what exists right now won't fit all use cases that people might wish for datasets, and in the coming minor releases (2.5, 2.6, etc.) we will expand and improve upon this foundation.
+
+Datasets represent the abstract concept of a dataset, and (for now) do not have any direct read or write capability - in this release we are adding the foundational feature that we will build upon in the future - and it's part of our goal to have smaller releases to get new features in your hands sooner!
+
+For more information on datasets, see the [documentation on Data-aware scheduling][data-aware-scheduling]. That includes details on how datasets are identified (URIs), how you can depend on multiple datasets, and how to think about what a dataset is (hint: don't include "date partitions" in a dataset, it's higher level than that).
+
+[data-aware-scheduling]: https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html
+
+## More improvments to Dynamic Task Mapping (AIP-42)
+
+You asked, we listened. Dynamic task mapping now includes support for:
+
+- `expand_kwargs`: To assign multiple parameters to a non-TaskFlow operator.
+- `zip`: To combine multiple things without cross-product.
+- `map`: To transform the parameters just before the task is run.
+
+For more information on dynamic task mapping, see the new sections of the doc on [Transforming Mapped Data][transforming-mapped-data], [Combining upstream data (aka "zipping")][task-mapping-zip], and [Assigning multiple parameters to a non-TaskFlow operator][expand-kwargs].
+
+[task-mapping-zip]: https://airflow.apache.org/docs/apache-airflow/stable/concepts/dynamic-task-mapping.html#combining-upstream-data-aka-zipping
+[transforming-mapped-data]: https://airflow.apache.org/docs/apache-airflow/stable/concepts/dynamic-task-mapping.html#transforming-mapped-data
+[expand-kwargs]: https://airflow.apache.org/docs/apache-airflow/stable/concepts/dynamic-task-mapping.html#assigning-multiple-parameters-to-a-non-taskflow-operator
+
+## Auto-register DAGs used in a context manager (no more `as dag:` needed)
+
+This one is a small quality of life improvement, and I don't want to admit how many times I forgot the `as dag:`, or worse, had `as dag:` repeated.
+
+```python
+
+with DAG(dag_id="example") as dag:
+  ...
+
+
+@dag
+def dag_maker():
+    ...
+
+
+dag2 = dag_maker()
+```
+
+can become
+
+```python
+
+with DAG(dag_id="example"):
+    ...
+
+
+@dag
+def my_dag():
+    ...
+
+
+my_dag()
+```
+
+If you want to disable the behaviour for any reason, set `auto_register=False` on the DAG:
+
+```python
+# This dag will not be picked up by Airflow as it's not assigned to a variable
+with DAG(dag_id="example", auto_register=False):
+    ...
+```
+
+## Removal of experimental Smart Sensors feature
+
+Smart Sensors were added in Airflow 2.0 and are were deprecated starting from Airflow 2.2 in favor of Deferrable operators. If you are using smart sensors, you will have to switch to using deferrable operators before you can upgrade to Airflow 2.4.
+
+We're sorry to remove this feature (we didn't do it lightly) but to enable us to continue to grow and evolve Airflow we needed to remove this experimental code. We will only do this sort of change in a minor release for features marked as experimental. Any feature that is fully supported will only ever be removed in a major release.
+

Review Comment:
   ```suggestion
   
   ## Operators, hooks and sensors from 1.10 tha were deprecated that were present in `contrib`, 'airflow.operators`, `airflow.hooks`, `airflow.sensors` package have been removed and they will no longer be valid when you do static type checking and auto-completion in your IDEs. This is not a breaking change, as we've implemented dynamic handling of package attributes and they operators will continue to work if you use them in your DAGs, but we hope you are encouraged to move away from using those deprecated operators even more.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org