You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "fritz-astronomer (via GitHub)" <gi...@apache.org> on 2023/09/15 20:39:03 UTC

[GitHub] [airflow] fritz-astronomer opened a new pull request, #34410: Docs for triggered_dataset_event

fritz-astronomer opened a new pull request, #34410:
URL: https://github.com/apache/airflow/pull/34410

   - add `templates.rst` reference for triggering_dataset_events
   - adds a note to check the templates page on the datasets page


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] fritz-astronomer commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "fritz-astronomer (via GitHub)" <gi...@apache.org>.
fritz-astronomer commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327827097


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   😂 hoisted by my own petard. I feel like, maybe future work could be an Airflow-wide macro to make this less bend-over-backward?
   
   I'm hesitant to add a DAG-specific macro to the example DAG as that might be "a step too far" if people are unfamiliar with multiple concepts being presented at once, and then don't know why things are breaking if they just copypaste the jinja template, but don't know they also need the special macro.
   
   I think the `| first` also make the structure of it obvious. So, though ugly, I vaguely prefer being explicit with the ugliness. 
   
   re: sorting - I have no idea how the list sorts. Luckily this is an example for a single-dataset schedule, so it doesn't matter 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327823719


##########
docs/apache-airflow/templates-ref.rst:
##########
@@ -33,25 +33,25 @@ Variables
 The Airflow engine passes a few variables by default that are accessible
 in all templates
 
-=========================================== ===================== ===================================================================
-Variable                                    Type                  Description
-=========================================== ===================== ===================================================================

Review Comment:
   That's up to you. I'm initially found that entirely table changed, but seems like something indents broken and you have already fix it 👍 



##########
docs/apache-airflow/templates-ref.rst:
##########
@@ -33,25 +33,25 @@ Variables
 The Airflow engine passes a few variables by default that are accessible
 in all templates
 
-=========================================== ===================== ===================================================================
-Variable                                    Type                  Description
-=========================================== ===================== ===================================================================

Review Comment:
   That's up to you. I'm initially found that entirely table changed, but seems like something indents broken and you have already fix it 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] fritz-astronomer commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "fritz-astronomer (via GitHub)" <gi...@apache.org>.
fritz-astronomer commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327847108


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   Added the link @Taragolis - that also gives an opportunity to explain what the heck is happening there 😅 



##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   Added the link @Taragolis - that also gives an opportunity to explain what the heck is happening there 😅 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] fritz-astronomer commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "fritz-astronomer (via GitHub)" <gi...@apache.org>.
fritz-astronomer commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327827097


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   😂 hoisted by my own petard. I feel like, maybe future work could be an Airflow-wide macro to make this less bend-over-backward?
   
   I'm hesitant to add a DAG-specific macro to the example DAG as that might be "a step too far" if people are unfamiliar with multiple concepts being presented at once, and then don't know why things are breaking if they just copypaste the jinja template, but don't know they also need the special macro.
   
   I think the `| first` also make the structure of it obvious. So, though ugly, I vaguely prefer being explicit with the ugliness



##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   😂 hoisted by my own petard. I feel like, maybe future work could be an Airflow-wide macro to make this less bend-over-backward?
   
   I'm hesitant to add a DAG-specific macro to the example DAG as that might be "a step too far" if people are unfamiliar with multiple concepts being presented at once, and then don't know why things are breaking if they just copypaste the jinja template, but don't know they also need the special macro.
   
   I think the `| first` also make the structure of it obvious. So, though ugly, I vaguely prefer being explicit with the ugliness



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] fritz-astronomer commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "fritz-astronomer (via GitHub)" <gi...@apache.org>.
fritz-astronomer commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327822689


##########
docs/apache-airflow/templates-ref.rst:
##########
@@ -33,25 +33,25 @@ Variables
 The Airflow engine passes a few variables by default that are accessible
 in all templates
 
-=========================================== ===================== ===================================================================
-Variable                                    Type                  Description
-=========================================== ===================== ===================================================================

Review Comment:
   Agreed. I didn't know that existed! I have a real hard time with `.rst` in general, it's not very intuitive to me and I get tripped up on small stuff. I prefer `.md`, personally. 
   
   That is a larger change than I'd like for this specific PR so I will not attempt to apply that currently, but will put it on my list of things for future PRs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] hussein-awala commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327832573


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   > I'm hesitant to add a DAG-specific macro to the example DAG as that might be "a step too far" if people are unfamiliar with multiple concepts being presented at once
   
   I agree, it's better to use a standard/official thing, even if it's ugly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "dstandish (via GitHub)" <gi...@apache.org>.
dstandish commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327837233


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   I will just quietly sneak in a change to macro later ;)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327804993


##########
docs/apache-airflow/templates-ref.rst:
##########
@@ -33,25 +33,25 @@ Variables
 The Airflow engine passes a few variables by default that are accessible
 in all templates
 
-=========================================== ===================== ===================================================================
-Variable                                    Type                  Description
-=========================================== ===================== ===================================================================

Review Comment:
   IMO [List Table](https://docutils.sourceforge.io/docs/ref/rst/directives.html#toc-entry-23) much easier to create / update rather then [Simple Tables](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#toc-entry-29)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "dstandish (via GitHub)" <gi...@apache.org>.
dstandish commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327827537


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   as you wish, as you wish



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327823719


##########
docs/apache-airflow/templates-ref.rst:
##########
@@ -33,25 +33,25 @@ Variables
 The Airflow engine passes a few variables by default that are accessible
 in all templates
 
-=========================================== ===================== ===================================================================
-Variable                                    Type                  Description
-=========================================== ===================== ===================================================================

Review Comment:
   That's up to you. I'm initially found that entirely table changed, but seems like something indents was broken and you have already fix it 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis merged pull request #34410: Docs for triggered_dataset_event

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis merged PR #34410:
URL: https://github.com/apache/airflow/pull/34410


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327834867


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   I've bet that someone might be confused what actually `first`, not users all familiar with Jinja filters and even don't know which are builtin, for example it might be a good idea to add this link in code example comments https://jinja.palletsprojects.com/en/3.1.x/templates/#jinja-filters.first



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "dstandish (via GitHub)" <gi...@apache.org>.
dstandish commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327823574


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -197,3 +197,40 @@ Notes on schedules
 The ``schedule`` parameter to your DAG can take either a list of datasets to consume or a timetable-based option. The two cannot currently be mixed.
 
 When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
+
+Fetching information from a Triggering Dataset Event
+----------------------------------------------------
+
+A triggered DAG can fetch information from the Dataset that triggered it using the ``triggering_dataset_events`` template or parameter.
+See more at :ref:`templates-ref`.
+
+Example:
+
+.. code-block:: python
+
+    example_snowflake_dataset = Dataset("snowflake://my_db.my_schema.my_table")
+
+    with DAG(dag_id="load_snowflake_data", schedule="@hourly", ...):
+        SQLExecuteQueryOperator(
+            task_id="load", conn_id="snowflake_default", outlets=[example_snowflake_dataset], ...
+        )
+
+    with DAG(dag_id="query_snowflake_data", schedule=[example_snowflake_dataset], ...):
+        SQLExecuteQueryOperator(
+            task_id="query",
+            conn_id="snowflake_default",
+            sql="""
+              SELECT *
+              FROM my_db.my_schema.my_table
+              WHERE "updated_at" >= '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_start }}'
+              AND "updated_at" < '{{ (triggering_dataset_events.values() | first | first).source_dag_run.data_interval_end }}';

Review Comment:
   to use your words this is ugly... maybe it's weird to use a user-defined macro in the example dag, but it would make this easier to follow i think, and it's probably a better thing to do 
   
   do these sort properly? are we actually getting the first dataset event by time (for lower bound), and the last one by time for upper bound?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] fritz-astronomer commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "fritz-astronomer (via GitHub)" <gi...@apache.org>.
fritz-astronomer commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327822689


##########
docs/apache-airflow/templates-ref.rst:
##########
@@ -33,25 +33,25 @@ Variables
 The Airflow engine passes a few variables by default that are accessible
 in all templates
 
-=========================================== ===================== ===================================================================
-Variable                                    Type                  Description
-=========================================== ===================== ===================================================================

Review Comment:
   Agreed. I didn't know that existed! I have a real hard time with `.rst` in general, it's not very intuitive. Prefer `.md`, personally. 
   
   That is a larger change than I'd like for this specific PR so I will not attempt to apply that currently, but will put it on my list of things for future PRs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #34410: Docs for triggered_dataset_event

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #34410:
URL: https://github.com/apache/airflow/pull/34410#discussion_r1327804993


##########
docs/apache-airflow/templates-ref.rst:
##########
@@ -33,25 +33,25 @@ Variables
 The Airflow engine passes a few variables by default that are accessible
 in all templates
 
-=========================================== ===================== ===================================================================
-Variable                                    Type                  Description
-=========================================== ===================== ===================================================================

Review Comment:
   IMO [List Table](https://docutils.sourceforge.io/docs/ref/rst/directives.html#list-table) much easier to create / update rather then [Simple Tables](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#simple-tables)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] fritz-astronomer commented on pull request #34410: Docs for triggered_dataset_event

Posted by "fritz-astronomer (via GitHub)" <gi...@apache.org>.
fritz-astronomer commented on PR #34410:
URL: https://github.com/apache/airflow/pull/34410#issuecomment-1721897563

   Good to go 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org