You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/20 21:28:28 UTC

[GitHub] [airflow] uranusjr opened a new pull request #20435: Also track task_log_prefix_template changes

uranusjr opened a new pull request #20435:
URL: https://github.com/apache/airflow/pull/20435


   Extending #20165 because it turns out tracking just `log_filename_template` is not enough.
   
   The LogFilename model is renamed to LogTemplate, and now contains two template fields `filename` (for `log_filename_template`) and `task_prefix` (for `task_log_prefix_template`). The two fields are tracked together when Airflow launches as previously.
   
   The custom task log formatter is updated to use the database entry instead of the configuration, similar to the log filename renderer.
   
   Since the previous change has not been released, I opted to update the migration in-place instead of creating a new one.
   
   Also added some more tests on `task_log_prefix_template`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #20435:
URL: https://github.com/apache/airflow/pull/20435#discussion_r772735437



##########
File path: UPDATING.md
##########
@@ -89,11 +89,11 @@ Smart sensors, an "early access" feature added in Airflow 2, are now deprecated
 
 See [Migrating to Deferrable Operators](https://airflow.apache.org/docs/apache-airflow/2.3.0/concepts/smart-sensors.html#migrating-to-deferrable-operators) for details on how to migrate.
 
-### Task log filenames are not rendered from database entry instead of config value
+### Task log templates are now read from the metadatabase instead of `airflow.cfg`
 
-Previously, filename of a task’s log is dynamically rendered from the ``[core] log_filename_template`` config value at runtime. This resulted in unfortunate characteristics like it is inpractical to modify the config value after an Airflow instance is running for a while, since all existing task logs have be saved under the previous format and cannot be found with the new config value.
+Previously, a task’s log is dynamically rendered from the `[core] log_filename_template` and `[core] task_log_prefix_template` config values at runtime. This resulted in unfortunate characteristics, e.g. it is impractical to modify the config value after an Airflow instance is running for a while, since all existing task logs have be saved under the previous format and cannot be found with the new config value.

Review comment:
       ```suggestion
   Previously, a task’s log is dynamically rendered from the `[core] log_filename_template` and `[core] task_log_prefix_template` config values at runtime. This resulted in unfortunate characteristics, e.g. it is impractical to modify the config value after an Airflow instance is running for a while, since all existing task logs have been saved under the previous format and cannot be found with the new config value.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20435:
URL: https://github.com/apache/airflow/pull/20435#discussion_r772943562



##########
File path: airflow/models/dagrun.py
##########
@@ -939,14 +939,27 @@ def schedule_tis(self, schedulable_tis: Iterable[TI], session: Session = NEW_SES
         return count
 
     @provide_session
-    def get_log_filename_template(self, *, session: Session = NEW_SESSION) -> Optional[str]:
-        if self.log_filename_id is None:  # DagRun created before LogFilename introduction.
-            template = session.query(LogFilename.template).order_by(LogFilename.id).limit(1).scalar()
+    def get_log_filename_template(self, *, session: Session = NEW_SESSION) -> str:
+        if self.log_template_id is None:  # DagRun created before LogTemplate introduction.
+            template = session.query(LogTemplate.filename).order_by(LogTemplate.id).limit(1).scalar()
         else:
-            template = session.query(LogFilename.template).filter_by(id=self.log_filename_id).scalar()
+            template = session.query(LogTemplate.filename).filter_by(id=self.log_template_id).scalar()
         if template is None:
             raise AirflowException(
-                f"No log_filename entry found for ID {self.log_filename_id!r}. "
+                f"No log_template entry found for ID {self.log_template_id!r}. "
+                f"Please make sure you set up the metadatabase correctly."

Review comment:
       Should we turn it into a wanrning and fall-back to the first log template id ?
   
   I think exception on missing log id is a bit too much as it might result in "stack-trace error" (maybe that is intended though?).
   
   Also maybe the message could be somehow more specific. As the current one is not really actionable. I presume at the first start of airlfow there will be at least one entry created, so I guess the only case it can happen is when `log_template_id`  is set in DagRun but missing in the LogTemplate. 
   
   Maybe we should be very explicit "Your dag_run table is corrupted. The entry in your dag_run table with dag_run_id: <id> contains invalid log_template_id: <log_template_id>. You should set log_template_id for this field to one of the existing log_template_ids or set to NULL." . Or something like that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #20435:
URL: https://github.com/apache/airflow/pull/20435#issuecomment-998315047


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr merged pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
uranusjr merged pull request #20435:
URL: https://github.com/apache/airflow/pull/20435


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #20435:
URL: https://github.com/apache/airflow/pull/20435#discussion_r772681804



##########
File path: UPDATING.md
##########
@@ -89,11 +89,11 @@ Smart sensors, an "early access" feature added in Airflow 2, are now deprecated
 
 See [Migrating to Deferrable Operators](https://airflow.apache.org/docs/apache-airflow/2.3.0/concepts/smart-sensors.html#migrating-to-deferrable-operators) for details on how to migrate.
 
-### Task log filenames are not rendered from database entry instead of config value
+### Task log templates are now read from the metadatabase instead of `airflow.ini`

Review comment:
       Oops, thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20435:
URL: https://github.com/apache/airflow/pull/20435#discussion_r772934591



##########
File path: docs/apache-airflow/migrations-ref.rst
##########
@@ -23,7 +23,8 @@ Here's the list of all the Database Migrations that are executed via when you ru
 +--------------------------------+------------------+-----------------+---------------------------------------------------------------------------------------+
 | Revision ID                    | Revises ID       | Airflow Version | Description                                                                           |
 +--------------------------------+------------------+-----------------+---------------------------------------------------------------------------------------+
-| ``f9da662e7089`` (head)        | ``786e3737b18f`` | ``2.3.0``       | Add ``LogFilename`` table to track ``log_filename_template`` value changes.           |
+| ``f9da662e7089`` (head)        | ``786e3737b18f`` | ``2.3.0``       | Add ``LoTemplate`` table to track changes to config values ``log_filename_template``  |

Review comment:
       ```suggestion
   | ``f9da662e7089`` (head)        | ``786e3737b18f`` | ``2.3.0``       | Add ``LogTemplate`` table to track changes to config values ``log_filename_template``  |
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20435:
URL: https://github.com/apache/airflow/pull/20435#discussion_r772943562



##########
File path: airflow/models/dagrun.py
##########
@@ -939,14 +939,27 @@ def schedule_tis(self, schedulable_tis: Iterable[TI], session: Session = NEW_SES
         return count
 
     @provide_session
-    def get_log_filename_template(self, *, session: Session = NEW_SESSION) -> Optional[str]:
-        if self.log_filename_id is None:  # DagRun created before LogFilename introduction.
-            template = session.query(LogFilename.template).order_by(LogFilename.id).limit(1).scalar()
+    def get_log_filename_template(self, *, session: Session = NEW_SESSION) -> str:
+        if self.log_template_id is None:  # DagRun created before LogTemplate introduction.
+            template = session.query(LogTemplate.filename).order_by(LogTemplate.id).limit(1).scalar()
         else:
-            template = session.query(LogFilename.template).filter_by(id=self.log_filename_id).scalar()
+            template = session.query(LogTemplate.filename).filter_by(id=self.log_template_id).scalar()
         if template is None:
             raise AirflowException(
-                f"No log_filename entry found for ID {self.log_filename_id!r}. "
+                f"No log_template entry found for ID {self.log_template_id!r}. "
+                f"Please make sure you set up the metadatabase correctly."

Review comment:
       Should we turn it into a wanrning and fall-back to the first log template id ?
   
   I think exception on missing log id is a bit too much as it might result in "stack-trace error".
   
   Also maybe the message could be somehow more specific. As the current one is not really actionable. I presume at the first start of airlfow there will be at least one entry created, so I guess the only case it can happen is when `log_template_id`  is set in DagRun but missing in the LogTemplate. 
   
   Maybe we should be very explicit "Your dag_run table is corrupted. The entry in your dag_run table with dag_run_id: <id> contains invalid log_template_id: <log_template_id>. You should set log_template_id for this field to one of the existing log_template_ids or set to NULL." . Or something like that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #20435:
URL: https://github.com/apache/airflow/pull/20435#discussion_r772679933



##########
File path: UPDATING.md
##########
@@ -89,11 +89,11 @@ Smart sensors, an "early access" feature added in Airflow 2, are now deprecated
 
 See [Migrating to Deferrable Operators](https://airflow.apache.org/docs/apache-airflow/2.3.0/concepts/smart-sensors.html#migrating-to-deferrable-operators) for details on how to migrate.
 
-### Task log filenames are not rendered from database entry instead of config value
+### Task log templates are now read from the metadatabase instead of `airflow.ini`

Review comment:
       ```suggestion
   ### Task log templates are now read from the metadatabase instead of `airflow.cfg`
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #20435:
URL: https://github.com/apache/airflow/pull/20435#issuecomment-998738148


   ```
   tests/executors/test_dask_executor.py::TestDaskExecutor::test_dask_executor_functions:
   ValueError: The futures should have finished; there is probably an error communicating with the Dask cluster.
   ```
   
   Hmm. I don’t recall seeing this being flaky previously, maybe something new going wrong? Likely not related though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #20435: Also track task_log_prefix_template changes

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #20435:
URL: https://github.com/apache/airflow/pull/20435#discussion_r772961071



##########
File path: airflow/models/dagrun.py
##########
@@ -939,14 +939,27 @@ def schedule_tis(self, schedulable_tis: Iterable[TI], session: Session = NEW_SES
         return count
 
     @provide_session
-    def get_log_filename_template(self, *, session: Session = NEW_SESSION) -> Optional[str]:
-        if self.log_filename_id is None:  # DagRun created before LogFilename introduction.
-            template = session.query(LogFilename.template).order_by(LogFilename.id).limit(1).scalar()
+    def get_log_filename_template(self, *, session: Session = NEW_SESSION) -> str:
+        if self.log_template_id is None:  # DagRun created before LogTemplate introduction.
+            template = session.query(LogTemplate.filename).order_by(LogTemplate.id).limit(1).scalar()
         else:
-            template = session.query(LogFilename.template).filter_by(id=self.log_filename_id).scalar()
+            template = session.query(LogTemplate.filename).filter_by(id=self.log_template_id).scalar()
         if template is None:
             raise AirflowException(
-                f"No log_filename entry found for ID {self.log_filename_id!r}. "
+                f"No log_template entry found for ID {self.log_template_id!r}. "
+                f"Please make sure you set up the metadatabase correctly."

Review comment:
       This can never happen because 1. `None` has special meaning, and 2. there’s a foreign key constraint so the database guanratees any non-null value points to something. Things are going very wrong if we reach this code, so I think it’s better to fail extremely loudly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org