You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 13:59:30 UTC

[GitHub] [airflow] ashb opened a new pull request #21495: 🚧 Change default log filename template to include map_index

ashb opened a new pull request #21495:
URL: https://github.com/apache/airflow/pull/21495


   With the recently added LogTemplate mechanism old TIs will still use the format they had at creation time (with the change here to ensure that we create a LogTemplate row for the just-upgraded-in-place) so the logs can still be viewed in the UI.
   
   And since it was now getting quite "deep" I have chosen to "label" the components in the "hive partition style".
   
   I'll need to test this change with Elasticsearch too
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#issuecomment-1040158043


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r805963797



##########
File path: airflow/models/mappedoperator.py
##########
@@ -196,6 +196,9 @@ class MappedOperator(AbstractOperator):
     upstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
     downstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
 
+    inlets: list = attr.ib(factory=list)
+    outlets: list = attr.ib(factory=list)

Review comment:
       Properties looking at where? `parital_kwargs`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806062522



##########
File path: airflow/models/taskinstance.py
##########
@@ -1836,7 +1838,7 @@ def get_template_context(
             params.update(task.params)
         if conf.getboolean('core', 'dag_run_conf_overrides_params'):
             self.overwrite_params_with_dag_run_conf(params=params, dag_run=dag_run)
-        validated_params = task.params = params.validate()
+        validated_params = params.validate()

Review comment:
       Was setting `task.params` here unintended or simply unnecessary?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r805950187



##########
File path: airflow/models/mappedoperator.py
##########
@@ -196,6 +196,9 @@ class MappedOperator(AbstractOperator):
     upstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
     downstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
 
+    inlets: list = attr.ib(factory=list)
+    outlets: list = attr.ib(factory=list)

Review comment:
       I think these should be properties. Also probably don’t need to be declared in AbstractOperator?

##########
File path: airflow/models/mappedoperator.py
##########
@@ -196,6 +196,9 @@ class MappedOperator(AbstractOperator):
     upstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
     downstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
 
+    inlets: list = attr.ib(factory=list)
+    outlets: list = attr.ib(factory=list)

Review comment:
       Yeah and populate the partial_kwargs key in `BaseOperator.partial` and `_TaskDecorator.map`.

##########
File path: airflow/decorators/base.py
##########
@@ -309,6 +309,8 @@ def map(self, **kwargs) -> XComArg:
             partial_kwargs.get("retry_delay", DEFAULT_RETRY_DELAY),
         )
         partial_kwargs.setdefault("executor_config", {})
+        partial_kwargs.setdefault("inlets", [])
+        partial_kwargs.setdefault("outlets", [])

Review comment:
       Should these be None?

##########
File path: airflow/models/taskinstance.py
##########
@@ -1836,7 +1838,7 @@ def get_template_context(
             params.update(task.params)
         if conf.getboolean('core', 'dag_run_conf_overrides_params'):
             self.overwrite_params_with_dag_run_conf(params=params, dag_run=dag_run)
-        validated_params = task.params = params.validate()
+        validated_params = params.validate()

Review comment:
       Was setting `task.params` here unintended or simply unnecessary?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806010072



##########
File path: airflow/config_templates/config.yml
##########
@@ -598,7 +598,9 @@
       version_added: 2.0.0
       type: string
       example: ~
-      default: "{{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log"
+      default: "dag_id={{{{ ti.dag_id }}}}/run_id={{{{ ti.run_id }}}}/task_id={{{{ ti.task_id }}}}/\
+               {{%% if ti.map_index != -1 %%}}map_index={{{{ ti.map_index }}}}/{{%% endif %%}}\

Review comment:
       `map_index >= 0` in this case, but yeah good idea.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r805946418



##########
File path: airflow/config_templates/config.yml
##########
@@ -598,7 +598,9 @@
       version_added: 2.0.0
       type: string
       example: ~
-      default: "{{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log"
+      default: "dag_id={{{{ ti.dag_id }}}}/run_id={{{{ ti.run_id }}}}/task_id={{{{ ti.task_id }}}}/\
+               {{%% if ti.map_index != -1 %%}}map_index={{{{ ti.map_index }}}}/{{%% endif %%}}\

Review comment:
       It may be better to standardise the check to `map_index < 0` instead; IMO that’s more β€œlogical” than specifically check for `-1`.
   
   Of course, we shouldn’t have anything lower than -1, but in the very weird chance we hit one rouge value, it’s likely better to treat it as unmapped, since treating it as mapped would cause a crash later that’s difficult to debug.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806220640



##########
File path: airflow/sensors/smart_sensor.py
##########
@@ -341,21 +340,21 @@ def _load_sensor_works(self, session=None):
                 .filter(SI.state == State.SENSING)
                 .filter(SI.shardcode < self.shard_max, SI.shardcode >= self.shard_min)
             )
-            tis = query.all()
+            sis = query.all()
 
-        self.log.info("Performance query %s tis, time: %.3f", len(tis), timer.duration)
+        self.log.info("Performance query %s sis, time: %.3f", len(sis), timer.duration)
 
         # Query without checking dagrun state might keep some failed dag_run tasks alive.
         # Join with DagRun table will be very slow based on the number of sensor tasks we
         # need to handle. We query all smart tasks in this operator
         # and expect scheduler correct the states in _change_state_for_tis_without_dagrun()
 
         sensor_works = []
-        for ti in tis:
+        for si in sis:
             try:
-                sensor_works.append(SensorWork(ti))
+                sensor_works.append(SensorWork(si))
             except Exception:
-                self.log.exception("Exception at creating sensor work for ti %s", ti.key)
+                self.log.exception("Exception at creating sensor work for ti %s", si.ti_key)

Review comment:
       It does, yes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806220771



##########
File path: docs/apache-airflow/logging-monitoring/logging-tasks.rst
##########
@@ -38,7 +38,12 @@ directory.
 .. note::
     For more information on setting the configuration, see :doc:`/howto/set-config`
 
-The following convention is followed while naming logs: ``{dag_id}/{task_id}/{logical_date}/{try_number}.log``
+The default pattern is followed while naming log files form tasks:
+
+- For normal tasks: ``dag_id{dag_id}/run_id={run_id}/task_id={task_id}/attempt={try_number}.log``.
+- For dynamically mapped tasks: ``dag_id{dag_id}/run_id={run_id}/task_id={task_id}/map_index={map_index}/attempt={try_number}.log``.

Review comment:
       🀦🏻 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: 🚧 Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r804590186



##########
File path: airflow/config_templates/config.yml
##########
@@ -598,7 +598,9 @@
       version_added: 2.0.0
       type: string
       example: ~
-      default: "{{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log"
+      default: "dag={{{{ ti.dag_id }}}}/task_id={{{{ ti.task_id }}}}/run_id={{{{ ti.run_id }}}}/\

Review comment:
       Started discussion/lazy consensus on list https://lists.apache.org/thread/omt2zn02o3wcxtsg7zbr5yjxdo1offvd




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#issuecomment-1040158043


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806185274



##########
File path: docs/apache-airflow/logging-monitoring/logging-tasks.rst
##########
@@ -38,7 +38,12 @@ directory.
 .. note::
     For more information on setting the configuration, see :doc:`/howto/set-config`
 
-The following convention is followed while naming logs: ``{dag_id}/{task_id}/{logical_date}/{try_number}.log``
+The default pattern is followed while naming log files form tasks:
+
+- For normal tasks: ``dag_id{dag_id}/run_id={run_id}/task_id={task_id}/attempt={try_number}.log``.
+- For dynamically mapped tasks: ``dag_id{dag_id}/run_id={run_id}/task_id={task_id}/map_index={map_index}/attempt={try_number}.log``.

Review comment:
       ```suggestion
   - For normal tasks: ``dag_id={dag_id}/run_id={run_id}/task_id={task_id}/attempt={try_number}.log``.
   - For dynamically mapped tasks: ``dag_id={dag_id}/run_id={run_id}/task_id={task_id}/map_index={map_index}/attempt={try_number}.log``.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806061684



##########
File path: airflow/decorators/base.py
##########
@@ -309,6 +309,8 @@ def map(self, **kwargs) -> XComArg:
             partial_kwargs.get("retry_delay", DEFAULT_RETRY_DELAY),
         )
         partial_kwargs.setdefault("executor_config", {})
+        partial_kwargs.setdefault("inlets", [])
+        partial_kwargs.setdefault("outlets", [])

Review comment:
       Should these be None?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806081909



##########
File path: airflow/decorators/base.py
##########
@@ -309,6 +309,8 @@ def map(self, **kwargs) -> XComArg:
             partial_kwargs.get("retry_delay", DEFAULT_RETRY_DELAY),
         )
         partial_kwargs.setdefault("executor_config", {})
+        partial_kwargs.setdefault("inlets", [])
+        partial_kwargs.setdefault("outlets", [])

Review comment:
       No, cos of this in BaseOperator init:
   
   ```        # Lineage
           self.inlets: List = []
           self.outlets: List = []
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806185745



##########
File path: docs/apache-airflow/logging-monitoring/logging-tasks.rst
##########
@@ -38,7 +38,12 @@ directory.
 .. note::
     For more information on setting the configuration, see :doc:`/howto/set-config`
 
-The following convention is followed while naming logs: ``{dag_id}/{task_id}/{logical_date}/{try_number}.log``
+The default pattern is followed while naming log files form tasks:

Review comment:
       ```suggestion
   The default pattern is followed while naming log files for tasks:
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806725067



##########
File path: airflow/sensors/smart_sensor.py
##########
@@ -341,21 +340,21 @@ def _load_sensor_works(self, session=None):
                 .filter(SI.state == State.SENSING)
                 .filter(SI.shardcode < self.shard_max, SI.shardcode >= self.shard_min)
             )
-            tis = query.all()
+            sis = query.all()
 
-        self.log.info("Performance query %s tis, time: %.3f", len(tis), timer.duration)
+        self.log.info("Performance query %s sis, time: %.3f", len(sis), timer.duration)
 
         # Query without checking dagrun state might keep some failed dag_run tasks alive.
         # Join with DagRun table will be very slow based on the number of sensor tasks we
         # need to handle. We query all smart tasks in this operator
         # and expect scheduler correct the states in _change_state_for_tis_without_dagrun()
 
         sensor_works = []
-        for ti in tis:
+        for si in sis:
             try:
-                sensor_works.append(SensorWork(ti))
+                sensor_works.append(SensorWork(si))
             except Exception:
-                self.log.exception("Exception at creating sensor work for ti %s", ti.key)
+                self.log.exception("Exception at creating sensor work for ti %s", si.ti_key)

Review comment:
       Er, no. Reverted this change. (I was thinking wrong class)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: 🚧 Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r803819020



##########
File path: airflow/config_templates/config.yml
##########
@@ -598,7 +598,9 @@
       version_added: 2.0.0
       type: string
       example: ~
-      default: "{{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log"
+      default: "dag={{{{ ti.dag_id }}}}/task_id={{{{ ti.task_id }}}}/run_id={{{{ ti.run_id }}}}/\

Review comment:
       Thinking about this now we should probably change it to dag->run->task->map->try so that the order is least significant to most significant.
   
   What do people think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #21495: 🚧 Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#issuecomment-1036445348


   Example of current layout:
   
   ```
   $ tree ~/airflow/logs
   /home/ash/airflow/logs
   β”œβ”€β”€ dag_id=maptest
   β”‚   β”œβ”€β”€ run_id=backfill__2022-02-10T00:00:00+00:00
   β”‚   β”‚   β”œβ”€β”€ task_id=consumer
   β”‚   β”‚   β”‚   β”œβ”€β”€ map_index=0
   β”‚   β”‚   β”‚   β”‚   └── attempt=1.log
   β”‚   β”‚   β”‚   β”œβ”€β”€ map_index=1
   β”‚   β”‚   β”‚   β”‚   └── attempt=1.log
   β”‚   β”‚   β”‚   └── map_index=2
   β”‚   β”‚   β”‚       └── attempt=1.log
   β”‚   β”‚   └── task_id=make_list
   β”‚   β”‚       └── attempt=1.log
   └── scheduler
       β”œβ”€β”€ 2022-02-10
       └── latest -> /home/ash/airflow/logs/scheduler/2022-02-10
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r805969856



##########
File path: airflow/models/mappedoperator.py
##########
@@ -196,6 +196,9 @@ class MappedOperator(AbstractOperator):
     upstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
     downstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
 
+    inlets: list = attr.ib(factory=list)
+    outlets: list = attr.ib(factory=list)

Review comment:
       Yeah and populate the partial_kwargs key in `BaseOperator.partial` and `_TaskDecorator.map`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r805959617



##########
File path: airflow/config_templates/config.yml
##########
@@ -598,7 +598,9 @@
       version_added: 2.0.0
       type: string
       example: ~
-      default: "{{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log"
+      default: "dag_id={{{{ ti.dag_id }}}}/run_id={{{{ ti.run_id }}}}/task_id={{{{ ti.task_id }}}}/\
+               {{%% if ti.map_index != -1 %%}}map_index={{{{ ti.map_index }}}}/{{%% endif %%}}\

Review comment:
       worth updating description key too so users are aware about map_index and https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806081503



##########
File path: airflow/models/taskinstance.py
##########
@@ -1836,7 +1838,7 @@ def get_template_context(
             params.update(task.params)
         if conf.getboolean('core', 'dag_run_conf_overrides_params'):
             self.overwrite_params_with_dag_run_conf(params=params, dag_run=dag_run)
-        validated_params = task.params = params.validate()
+        validated_params = params.validate()

Review comment:
       I forget the error, but setting it here was giving some kind of error (I think from it being set _twice_ on the same in-memory object, once in the supervisor, and once in the actual runner.)
   
   So I took the approach that `get_*` should never have any sideffects!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r805959617



##########
File path: airflow/config_templates/config.yml
##########
@@ -598,7 +598,9 @@
       version_added: 2.0.0
       type: string
       example: ~
-      default: "{{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log"
+      default: "dag_id={{{{ ti.dag_id }}}}/run_id={{{{ ti.run_id }}}}/task_id={{{{ ti.task_id }}}}/\
+               {{%% if ti.map_index != -1 %%}}map_index={{{{ ti.map_index }}}}/{{%% endif %%}}\

Review comment:
       worth updating description key too so users are aware about map_index and https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html too

##########
File path: airflow/sensors/smart_sensor.py
##########
@@ -341,21 +340,21 @@ def _load_sensor_works(self, session=None):
                 .filter(SI.state == State.SENSING)
                 .filter(SI.shardcode < self.shard_max, SI.shardcode >= self.shard_min)
             )
-            tis = query.all()
+            sis = query.all()
 
-        self.log.info("Performance query %s tis, time: %.3f", len(tis), timer.duration)
+        self.log.info("Performance query %s sis, time: %.3f", len(sis), timer.duration)
 
         # Query without checking dagrun state might keep some failed dag_run tasks alive.
         # Join with DagRun table will be very slow based on the number of sensor tasks we
         # need to handle. We query all smart tasks in this operator
         # and expect scheduler correct the states in _change_state_for_tis_without_dagrun()
 
         sensor_works = []
-        for ti in tis:
+        for si in sis:
             try:
-                sensor_works.append(SensorWork(ti))
+                sensor_works.append(SensorWork(si))
             except Exception:
-                self.log.exception("Exception at creating sensor work for ti %s", ti.key)
+                self.log.exception("Exception at creating sensor work for ti %s", si.ti_key)

Review comment:
       Does `SensorInstance` have a `ti_key` attribute?

##########
File path: airflow/sensors/smart_sensor.py
##########
@@ -341,21 +340,21 @@ def _load_sensor_works(self, session=None):
                 .filter(SI.state == State.SENSING)
                 .filter(SI.shardcode < self.shard_max, SI.shardcode >= self.shard_min)
             )
-            tis = query.all()
+            sis = query.all()
 
-        self.log.info("Performance query %s tis, time: %.3f", len(tis), timer.duration)
+        self.log.info("Performance query %s sis, time: %.3f", len(sis), timer.duration)
 
         # Query without checking dagrun state might keep some failed dag_run tasks alive.
         # Join with DagRun table will be very slow based on the number of sensor tasks we
         # need to handle. We query all smart tasks in this operator
         # and expect scheduler correct the states in _change_state_for_tis_without_dagrun()
 
         sensor_works = []
-        for ti in tis:
+        for si in sis:
             try:
-                sensor_works.append(SensorWork(ti))
+                sensor_works.append(SensorWork(si))
             except Exception:
-                self.log.exception("Exception at creating sensor work for ti %s", ti.key)
+                self.log.exception("Exception at creating sensor work for ti %s", si.ti_key)

Review comment:
       (I couldn't find `si.key` too)

##########
File path: docs/apache-airflow/logging-monitoring/logging-tasks.rst
##########
@@ -38,7 +38,12 @@ directory.
 .. note::
     For more information on setting the configuration, see :doc:`/howto/set-config`
 
-The following convention is followed while naming logs: ``{dag_id}/{task_id}/{logical_date}/{try_number}.log``
+The default pattern is followed while naming log files form tasks:
+
+- For normal tasks: ``dag_id{dag_id}/run_id={run_id}/task_id={task_id}/attempt={try_number}.log``.
+- For dynamically mapped tasks: ``dag_id{dag_id}/run_id={run_id}/task_id={task_id}/map_index={map_index}/attempt={try_number}.log``.

Review comment:
       ```suggestion
   - For normal tasks: ``dag_id={dag_id}/run_id={run_id}/task_id={task_id}/attempt={try_number}.log``.
   - For dynamically mapped tasks: ``dag_id={dag_id}/run_id={run_id}/task_id={task_id}/map_index={map_index}/attempt={try_number}.log``.
   ```

##########
File path: docs/apache-airflow/logging-monitoring/logging-tasks.rst
##########
@@ -38,7 +38,12 @@ directory.
 .. note::
     For more information on setting the configuration, see :doc:`/howto/set-config`
 
-The following convention is followed while naming logs: ``{dag_id}/{task_id}/{logical_date}/{try_number}.log``
+The default pattern is followed while naming log files form tasks:

Review comment:
       ```suggestion
   The default pattern is followed while naming log files for tasks:
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r805963797



##########
File path: airflow/models/mappedoperator.py
##########
@@ -196,6 +196,9 @@ class MappedOperator(AbstractOperator):
     upstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
     downstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
 
+    inlets: list = attr.ib(factory=list)
+    outlets: list = attr.ib(factory=list)

Review comment:
       Properties looking at where? `parital_kwargs`?

##########
File path: airflow/config_templates/config.yml
##########
@@ -598,7 +598,9 @@
       version_added: 2.0.0
       type: string
       example: ~
-      default: "{{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log"
+      default: "dag_id={{{{ ti.dag_id }}}}/run_id={{{{ ti.run_id }}}}/task_id={{{{ ti.task_id }}}}/\
+               {{%% if ti.map_index != -1 %%}}map_index={{{{ ti.map_index }}}}/{{%% endif %%}}\

Review comment:
       `map_index >= 0` in this case, but yeah good idea.

##########
File path: airflow/models/taskinstance.py
##########
@@ -1836,7 +1838,7 @@ def get_template_context(
             params.update(task.params)
         if conf.getboolean('core', 'dag_run_conf_overrides_params'):
             self.overwrite_params_with_dag_run_conf(params=params, dag_run=dag_run)
-        validated_params = task.params = params.validate()
+        validated_params = params.validate()

Review comment:
       I forget the error, but setting it here was giving some kind of error (I think from it being set _twice_ on the same in-memory object, once in the supervisor, and once in the actual runner.)
   
   So I took the approach that `get_*` should never have any sideffects!

##########
File path: airflow/decorators/base.py
##########
@@ -309,6 +309,8 @@ def map(self, **kwargs) -> XComArg:
             partial_kwargs.get("retry_delay", DEFAULT_RETRY_DELAY),
         )
         partial_kwargs.setdefault("executor_config", {})
+        partial_kwargs.setdefault("inlets", [])
+        partial_kwargs.setdefault("outlets", [])

Review comment:
       No, cos of this in BaseOperator init:
   
   ```        # Lineage
           self.inlets: List = []
           self.outlets: List = []
   ```

##########
File path: airflow/sensors/smart_sensor.py
##########
@@ -341,21 +340,21 @@ def _load_sensor_works(self, session=None):
                 .filter(SI.state == State.SENSING)
                 .filter(SI.shardcode < self.shard_max, SI.shardcode >= self.shard_min)
             )
-            tis = query.all()
+            sis = query.all()
 
-        self.log.info("Performance query %s tis, time: %.3f", len(tis), timer.duration)
+        self.log.info("Performance query %s sis, time: %.3f", len(sis), timer.duration)
 
         # Query without checking dagrun state might keep some failed dag_run tasks alive.
         # Join with DagRun table will be very slow based on the number of sensor tasks we
         # need to handle. We query all smart tasks in this operator
         # and expect scheduler correct the states in _change_state_for_tis_without_dagrun()
 
         sensor_works = []
-        for ti in tis:
+        for si in sis:
             try:
-                sensor_works.append(SensorWork(ti))
+                sensor_works.append(SensorWork(si))
             except Exception:
-                self.log.exception("Exception at creating sensor work for ti %s", ti.key)
+                self.log.exception("Exception at creating sensor work for ti %s", si.ti_key)

Review comment:
       It does, yes.

##########
File path: docs/apache-airflow/logging-monitoring/logging-tasks.rst
##########
@@ -38,7 +38,12 @@ directory.
 .. note::
     For more information on setting the configuration, see :doc:`/howto/set-config`
 
-The following convention is followed while naming logs: ``{dag_id}/{task_id}/{logical_date}/{try_number}.log``
+The default pattern is followed while naming log files form tasks:
+
+- For normal tasks: ``dag_id{dag_id}/run_id={run_id}/task_id={task_id}/attempt={try_number}.log``.
+- For dynamically mapped tasks: ``dag_id{dag_id}/run_id={run_id}/task_id={task_id}/map_index={map_index}/attempt={try_number}.log``.

Review comment:
       🀦🏻 

##########
File path: airflow/decorators/base.py
##########
@@ -309,6 +309,8 @@ def map(self, **kwargs) -> XComArg:
             partial_kwargs.get("retry_delay", DEFAULT_RETRY_DELAY),
         )
         partial_kwargs.setdefault("executor_config", {})
+        partial_kwargs.setdefault("inlets", [])
+        partial_kwargs.setdefault("outlets", [])

Review comment:
       Or not. I was wrong and these aren't needed at all actually.

##########
File path: airflow/sensors/smart_sensor.py
##########
@@ -341,21 +340,21 @@ def _load_sensor_works(self, session=None):
                 .filter(SI.state == State.SENSING)
                 .filter(SI.shardcode < self.shard_max, SI.shardcode >= self.shard_min)
             )
-            tis = query.all()
+            sis = query.all()
 
-        self.log.info("Performance query %s tis, time: %.3f", len(tis), timer.duration)
+        self.log.info("Performance query %s sis, time: %.3f", len(sis), timer.duration)
 
         # Query without checking dagrun state might keep some failed dag_run tasks alive.
         # Join with DagRun table will be very slow based on the number of sensor tasks we
         # need to handle. We query all smart tasks in this operator
         # and expect scheduler correct the states in _change_state_for_tis_without_dagrun()
 
         sensor_works = []
-        for ti in tis:
+        for si in sis:
             try:
-                sensor_works.append(SensorWork(ti))
+                sensor_works.append(SensorWork(si))
             except Exception:
-                self.log.exception("Exception at creating sensor work for ti %s", ti.key)
+                self.log.exception("Exception at creating sensor work for ti %s", si.ti_key)

Review comment:
       Er, no. Reverted this change. (I was thinking wrong class)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb merged pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb merged pull request #21495:
URL: https://github.com/apache/airflow/pull/21495


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#issuecomment-1040090317


   @kaxil @uranusjr Final(?) review please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb merged pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb merged pull request #21495:
URL: https://github.com/apache/airflow/pull/21495


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806178995



##########
File path: airflow/sensors/smart_sensor.py
##########
@@ -341,21 +340,21 @@ def _load_sensor_works(self, session=None):
                 .filter(SI.state == State.SENSING)
                 .filter(SI.shardcode < self.shard_max, SI.shardcode >= self.shard_min)
             )
-            tis = query.all()
+            sis = query.all()
 
-        self.log.info("Performance query %s tis, time: %.3f", len(tis), timer.duration)
+        self.log.info("Performance query %s sis, time: %.3f", len(sis), timer.duration)
 
         # Query without checking dagrun state might keep some failed dag_run tasks alive.
         # Join with DagRun table will be very slow based on the number of sensor tasks we
         # need to handle. We query all smart tasks in this operator
         # and expect scheduler correct the states in _change_state_for_tis_without_dagrun()
 
         sensor_works = []
-        for ti in tis:
+        for si in sis:
             try:
-                sensor_works.append(SensorWork(ti))
+                sensor_works.append(SensorWork(si))
             except Exception:
-                self.log.exception("Exception at creating sensor work for ti %s", ti.key)
+                self.log.exception("Exception at creating sensor work for ti %s", si.ti_key)

Review comment:
       Does `SensorInstance` have a `ti_key` attribute?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806179717



##########
File path: airflow/sensors/smart_sensor.py
##########
@@ -341,21 +340,21 @@ def _load_sensor_works(self, session=None):
                 .filter(SI.state == State.SENSING)
                 .filter(SI.shardcode < self.shard_max, SI.shardcode >= self.shard_min)
             )
-            tis = query.all()
+            sis = query.all()
 
-        self.log.info("Performance query %s tis, time: %.3f", len(tis), timer.duration)
+        self.log.info("Performance query %s sis, time: %.3f", len(sis), timer.duration)
 
         # Query without checking dagrun state might keep some failed dag_run tasks alive.
         # Join with DagRun table will be very slow based on the number of sensor tasks we
         # need to handle. We query all smart tasks in this operator
         # and expect scheduler correct the states in _change_state_for_tis_without_dagrun()
 
         sensor_works = []
-        for ti in tis:
+        for si in sis:
             try:
-                sensor_works.append(SensorWork(ti))
+                sensor_works.append(SensorWork(si))
             except Exception:
-                self.log.exception("Exception at creating sensor work for ti %s", ti.key)
+                self.log.exception("Exception at creating sensor work for ti %s", si.ti_key)

Review comment:
       (I couldn't find `si.key` too)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#issuecomment-1040090317


   @kaxil @uranusjr Final(?) review please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r806261134



##########
File path: airflow/decorators/base.py
##########
@@ -309,6 +309,8 @@ def map(self, **kwargs) -> XComArg:
             partial_kwargs.get("retry_delay", DEFAULT_RETRY_DELAY),
         )
         partial_kwargs.setdefault("executor_config", {})
+        partial_kwargs.setdefault("inlets", [])
+        partial_kwargs.setdefault("outlets", [])

Review comment:
       Or not. I was wrong and these aren't needed at all actually.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21495: Change default log filename template to include map_index

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21495:
URL: https://github.com/apache/airflow/pull/21495#discussion_r805950187



##########
File path: airflow/models/mappedoperator.py
##########
@@ -196,6 +196,9 @@ class MappedOperator(AbstractOperator):
     upstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
     downstream_task_ids: Set[str] = attr.ib(factory=set, init=False)
 
+    inlets: list = attr.ib(factory=list)
+    outlets: list = attr.ib(factory=list)

Review comment:
       I think these should be properties. Also probably don’t need to be declared in AbstractOperator?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org