You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/16 13:35:18 UTC

[GitHub] [airflow-site] norm commented on a diff in pull request #659: Add posts about 2.4 release

norm commented on code in PR #659:
URL: https://github.com/apache/airflow-site/pull/659#discussion_r973031637


##########
landing-pages/site/content/en/blog/airflow-2.4.0/index.md:
##########
@@ -0,0 +1,140 @@
+---
+title: "Apache Airflow 2.4.0: Data
+linkTitle: "Apache Airflow 2.4.0"
+author: "Ash Berlin-Taylor"
+github: "ashberlin"
+linkedin: "ashberlin-taylor"
+description: "We're proud to announce that Apache Airflow 2.4.0 has been released."
+tags: [Release]
+date: "2022-09-19"
+---
+
+Apache Airflow
+Apache Airflow 2.4.0 contains over 650 "user-facing" commits in this release (excluding commits to providers or chart) and over 870 in total, not cou since 2.3.0 and includes 50 new features, 99 improvements, 85 bug fixes, and several doc changes.
+
+**Details**:
+
+📦 PyPI: https://pypi.org/project/apache-airflow/2.4.0/ \
+📚 Docs: https://airflow.apache.org/docs/apache-airflow/2.4.0/ \
+🛠️ Release Notes: https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html \
+🐳 Docker Image: docker pull apache/airflow:2.4.0 \
+🚏 Constraints: https://github.com/apache/airflow/tree/constraints-2.4.0
+
+
+## Data-aware scheduling (AIP-48)
+
+This one is big. Airflow now has the ability to schedule DAGs based on other tasks updating datasets.
+
+What does this mean, exactly? This is a great new feature and lets DAG authors create smaller, more self-contained DAGs, that can chain together into a larger data-based workflow. If you are currently using `ExternalTaskSensor` or `TriggerDagRunOperator` you should take a look at datasets -- in most cases you can replace them with something that will speed up the scheduling!
+
+But enough talking, lets have a short example. First lets create the task that produces the dataset:
+
+```python
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='producer', ...)
+    @task(outlets=[dataset])
+    def my_task():
+        ...
+```
+
+And then we can tell Airflow to schedule a DAG whenever this Dataset changes:
+
+```python
+
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='dataset-consumer', schedule=[dataset]):
+    ...
+```
+
+With these two dags, the instant that the `my_task` finishes Airflow will create the DAG run for the `dataset-consumer` workflow.
+
+(If you have the produce and consumer in different files you do not need to use the same Dataset object, two `Dataset()`s created with the same URI are equal.)

Review Comment:
   Also produce**r**



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org