You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/05 21:32:56 UTC

[GitHub] [airflow] jens-scheffler-bosch opened a new pull request, #26169: Feature/15783 auto tail file logs in UI

jens-scheffler-bosch opened a new pull request, #26169:
URL: https://github.com/apache/airflow/pull/26169

   This PR implements a log tail feature on file_task_handler
   
   As described in #15783 it would be a great feature to have log tail available in Airflow UI so that if you watch something running you don't need to hit the F5 key constantly to follow the running task.
   
   Actually I saw the same demand and with this PR want to contribute this feature upstream.
   
   I wanted to implement this from scratch when I realized that an auto-tail is already implemented in the UI, just the back-end needs to feed data in a way that tail is active. So this PR is a minimal-invasive implementation on the `file_task_handler.py` alongside with small adjustments to inheriting provider implementation.
   
   closes: #15783
   related: #15783
   
   How to test:
   * Start a long running DAG based on an airflow instance from this branch and see that the running task instance which produces logs will add content to the view.
   
   Example DAG I used for testing, running 200 seconds and adding log lines every second:
   ```
   import pendulum
   
   from airflow import DAG
   from airflow.operators.bash import BashOperator
   
   with DAG(
       dag_id='a_log_running_dag',
       schedule='0 * * * *',
       start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
       catchup=False,
   ) as dag:
       BashOperator(
           task_id='runme',
           bash_command='for i in {1..200}; do echo $i; sleep 1; done',
       )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #26169: Auto tail file logs in Web UI

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #26169:
URL: https://github.com/apache/airflow/pull/26169#discussion_r963445277


##########
airflow/providers/microsoft/azure/log/wasb_task_handler.py:
##########
@@ -127,7 +129,7 @@ def _read(self, ti, try_number: int, metadata: Optional[str] = None) -> Tuple[st
             log = f'*** Reading remote log from {remote_loc}.\n{remote_log}\n'
             return log, {'end_of_log': True}
         else:
-            return super()._read(ti, try_number)
+            return super()._read(ti, try_number, metadata)

Review Comment:
   You need to support not passing `metadata` here, otherwise all provider log handlers would be incompatible with 2.3 (and below), which is not acceptable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jens-scheffler-bosch commented on pull request #26169: Auto tail file logs in Web UI

Posted by GitBox <gi...@apache.org>.
jens-scheffler-bosch commented on PR #26169:
URL: https://github.com/apache/airflow/pull/26169#issuecomment-1243040032

   Hi @uranusjr Thanks for the review, the PR is "stale" now since a few days. Are further reviews needed? If yes, please let me know whom to get in touch with. Otherwise, if all is okay, who would be eligible to merge? (preventing further conflicts)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jens-scheffler-bosch commented on a diff in pull request #26169: Auto tail file logs in Web UI

Posted by GitBox <gi...@apache.org>.
jens-scheffler-bosch commented on code in PR #26169:
URL: https://github.com/apache/airflow/pull/26169#discussion_r963716278


##########
airflow/providers/microsoft/azure/log/wasb_task_handler.py:
##########
@@ -127,7 +129,7 @@ def _read(self, ti, try_number: int, metadata: Optional[str] = None) -> Tuple[st
             log = f'*** Reading remote log from {remote_loc}.\n{remote_log}\n'
             return log, {'end_of_log': True}
         else:
-            return super()._read(ti, try_number)
+            return super()._read(ti, try_number, metadata)

Review Comment:
   Thanks @uranusjr for the feedback! I made the changes in the provider log handlers in order to have a consistent status. Passing `metadata` now with this PR should only introduce consistency - as the client passes the `metadata` but current providers do not pass the `metadata` to the respective super-class consistently when running into the fallback.
   
   I double checked and the super class method interface is consistent since at least v2.0.0 so **it is not a breaking change** but a matter of consistency. Not passing `metadata` is and will further be supported.
   
   If you insist on reverting the provider package adjustments then I see only two options:
   
   1) Current log position can not be passed from web UI client to fallback provider class - in this case log would be tailed for running tasks always from the start (as client's last log position is known to super-class (file_task_handler.py)). This would also be the effect when using an older provider package with an Airflow version with this change applied.
   Means all users using not the file based log provider and taking a look to UI for a running TaskInstance will over-and-over tail the logs. Not a problem if task is not running anymore or provider package returns w/o super class code.
   
   2) We would need to adjust the interface between Web UI client and `file_task_handler.py` back-end to explicitly enable tailing for some use cases only. Then it would be a breaking change on client side (manageable) and provider packages will not benefit from super-class tailing capabilities.
   Means all users with standard file task handler will have a benefit of tailing and every provider package must implement log tailing individually. No tailing also in file fallback.
   
   Please let me know your opinion.
   
   (Overall I have the feeling that the log handling/provider feature detail is very badly documented, took a long time of code reading to understand details and fields... but I did not want to propose a major re-factoring in this area just for this feature)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jens-scheffler-bosch commented on a diff in pull request #26169: Auto tail file logs in Web UI

Posted by GitBox <gi...@apache.org>.
jens-scheffler-bosch commented on code in PR #26169:
URL: https://github.com/apache/airflow/pull/26169#discussion_r964777051


##########
tests/utils/log/test_log_reader.py:
##########
@@ -199,6 +199,7 @@ def test_read_log_stream_should_support_multiple_chunks(self, mock_read):
         mock_read.side_effect = [first_return, second_return, third_return, fourth_return]
 
         task_log_reader = TaskLogReader()
+        self.ti.state = State.SUCCESS

Review Comment:
   Thanks, applied



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #26169: Auto tail file logs in Web UI

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #26169:
URL: https://github.com/apache/airflow/pull/26169#discussion_r964775094


##########
tests/utils/log/test_log_reader.py:
##########
@@ -199,6 +199,7 @@ def test_read_log_stream_should_support_multiple_chunks(self, mock_read):
         mock_read.side_effect = [first_return, second_return, third_return, fourth_return]
 
         task_log_reader = TaskLogReader()
+        self.ti.state = State.SUCCESS

Review Comment:
   Better to use `TaskInstanceState.SUCCESS` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #26169: Auto tail file logs in Web UI

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #26169:
URL: https://github.com/apache/airflow/pull/26169#discussion_r965449185


##########
airflow/utils/log/log_reader.py:
##########
@@ -26,6 +26,7 @@
 from airflow.utils.helpers import render_log_filename
 from airflow.utils.log.logging_mixin import ExternalLoggingMixin
 from airflow.utils.session import NEW_SESSION, provide_session
+from airflow.utils.state import State

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #26169: Auto tail file logs in Web UI

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #26169:
URL: https://github.com/apache/airflow/pull/26169


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #26169: Feature/15783 auto tail file logs in UI

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #26169:
URL: https://github.com/apache/airflow/pull/26169#issuecomment-1237470799

   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org