You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/04 08:51:20 UTC

[GitHub] [airflow] JPonte opened a new pull request #16264: Add flag to delete local logs after upload

JPonte opened a new pull request #16264:
URL: https://github.com/apache/airflow/pull/16264


   Add flag to delete local logs after upload
   ---
   closes #16142
   
   Adds a new config flag `keep_local_logs` to give the option to delete the local logs after they're successfully uploaded to S3/GCS/WASB.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#issuecomment-931921560


   @JPonte can you please fix the failing tests and rebase?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#discussion_r743033278



##########
File path: airflow/config_templates/config.yml
##########
@@ -495,6 +495,13 @@
       type: string
       example: ~
       default: ""
+    - name: keep_local_logs
+      description: |
+        Whether the local file logs for GCS, S3 and WASB remote logging should be kept after they are uploaded
+      version_added: 2.2.0

Review comment:
       ```suggestion
         version_added: 2.3.0
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#issuecomment-1029516798


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JavierLopezT commented on pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
JavierLopezT commented on pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#issuecomment-893501062


   Commenting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JPonte commented on a change in pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
JPonte commented on a change in pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#discussion_r654192109



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -100,8 +100,8 @@ def close(self) -> None:
             with open(local_loc) as logfile:
                 log = logfile.read()
             self.wasb_write(log, remote_loc, append=True)
-
-            if self.delete_local_copy:
+            keep_local = conf.getboolean('logging', 'KEEP_LOCAL_LOGS')
+            if self.delete_local_copy or not keep_local:

Review comment:
       My only issue was keeping compatibility for those who right now rely on that flag, not sure what's the policy regarding that.

##########
File path: airflow/providers/google/cloud/log/gcs_task_handler.py
##########
@@ -132,7 +134,10 @@ def close(self):
             # read log and remove old logs to get just the latest additions
             with open(local_loc) as logfile:
                 log = logfile.read()
-            self.gcs_write(log, remote_loc)
+            success = self.gcs_write(log, remote_loc)
+            keep_local = conf.getboolean('logging', 'KEEP_LOCAL_LOGS')
+            if success and not keep_local:
+                shutil.rmtree(os.path.dirname(local_loc))

Review comment:
       I avoided putting it on `FileTaskHandler` because it was a bit messy but having a `RemoteFileTaskHandler` could indeed help there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#issuecomment-891411301


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#issuecomment-961151046


   @JPonte tests are still failing. Can you check it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#issuecomment-932980236


   @Jponte  - this "commonalising" of the code has it's drawback - nicely caught by our backwards compatibiliy installation and import tests for providers: https://github.com/apache/airflow/pull/16264/checks?check_run_id=3764848711#step:10:1463  
   
   The problem is that if you extract some code to the "airflow.utils" package and create a new function/module there, the providers will not work for earlier airflow versions (because the "common" part is not there)
   
   We could increase min airflow version as dependency, but I think this feature does not justify it (it should still be possible to install the providers on Airflow 2.1+) so we need to make sure it is backwards-compatible.
   
   My suggestion is to keep a copy of the "common" methods in each of the providers affected (additionally to the common one) and implement a fallback to use those in case importing it from "utils" airflow code fails. When you mark it clearly as "to be remeoved"  after the provider depends on Airflow 2.2 being available) it will provide a clear path of the dupilicte code removal in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#discussion_r743033278



##########
File path: airflow/config_templates/config.yml
##########
@@ -495,6 +495,13 @@
       type: string
       example: ~
       default: ""
+    - name: keep_local_logs
+      description: |
+        Whether the local file logs for GCS, S3 and WASB remote logging should be kept after they are uploaded
+      version_added: 2.2.0

Review comment:
       ```suggestion
         version_added: 2.3.0
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] o-nikolas commented on a change in pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on a change in pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#discussion_r653908548



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -100,8 +100,8 @@ def close(self) -> None:
             with open(local_loc) as logfile:
                 log = logfile.read()
             self.wasb_write(log, remote_loc, append=True)
-
-            if self.delete_local_copy:
+            keep_local = conf.getboolean('logging', 'KEEP_LOCAL_LOGS')
+            if self.delete_local_copy or not keep_local:

Review comment:
       I wonder if `delete_local_copy` is still needed now that you have introduced this global behaviour?

##########
File path: airflow/providers/google/cloud/log/gcs_task_handler.py
##########
@@ -132,7 +134,10 @@ def close(self):
             # read log and remove old logs to get just the latest additions
             with open(local_loc) as logfile:
                 log = logfile.read()
-            self.gcs_write(log, remote_loc)
+            success = self.gcs_write(log, remote_loc)
+            keep_local = conf.getboolean('logging', 'KEEP_LOCAL_LOGS')
+            if success and not keep_local:
+                shutil.rmtree(os.path.dirname(local_loc))

Review comment:
       You're implementing the same cleanup recipe several times on the back of a global config, both of which are indicators that this is a good candidate for logic that should live in a super class. Doing it ad hoc like this leaves us open for future developers of remote logging classes to forget or mis-implement this logic. The individual remote logging classes should only be responsible for doing the upload to their respective service, they shouldn't have to re-implement the cleanup.
   
   It is possible to teach `FileTaskHandler` to do this, but it would be tricky to make it work in both cases and is a bit smelly. It's maybe time for a new super class `RemoteTaskHandler`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JPonte commented on pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
JPonte commented on pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#issuecomment-933000574


   @potiuk Thanks for the input, I wasn't understanding why the tests were failing when they passed locally. I'll try to do the changes you recommended.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JPonte commented on a change in pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
JPonte commented on a change in pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#discussion_r646094037



##########
File path: airflow/providers/amazon/aws/log/s3_task_handler.py
##########
@@ -93,7 +94,10 @@ def close(self):
             # read log and remove old logs to get just the latest additions
             with open(local_loc) as logfile:
                 log = logfile.read()
-            self.s3_write(log, remote_loc)
+            success = self.s3_write(log, remote_loc)
+            keep_local = conf.getboolean('logging', 'KEEP_LOCAL_LOGS')
+            if success and not keep_local:
+                shutil.rmtree(os.path.dirname(local_loc))

Review comment:
       Is that possible? This happens on the close() of the handler. Is there any way another task handler has the same local_loc path and is trying to write to it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#discussion_r646178965



##########
File path: airflow/providers/amazon/aws/log/s3_task_handler.py
##########
@@ -93,7 +94,10 @@ def close(self):
             # read log and remove old logs to get just the latest additions
             with open(local_loc) as logfile:
                 log = logfile.read()
-            self.s3_write(log, remote_loc)
+            success = self.s3_write(log, remote_loc)
+            keep_local = conf.getboolean('logging', 'KEEP_LOCAL_LOGS')
+            if success and not keep_local:
+                shutil.rmtree(os.path.dirname(local_loc))

Review comment:
       Yep, I think you are right. Given `local_loc` is decided by `ti` + `try_number`, plus it's in `close()`, my earlier question should not be valid.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#discussion_r645996867



##########
File path: airflow/providers/amazon/aws/log/s3_task_handler.py
##########
@@ -93,7 +94,10 @@ def close(self):
             # read log and remove old logs to get just the latest additions
             with open(local_loc) as logfile:
                 log = logfile.read()
-            self.s3_write(log, remote_loc)
+            success = self.s3_write(log, remote_loc)
+            keep_local = conf.getboolean('logging', 'KEEP_LOCAL_LOGS')
+            if success and not keep_local:
+                shutil.rmtree(os.path.dirname(local_loc))

Review comment:
       What if something new is written to `local_loc` between `s3_write()` and `shutil.rmtree()`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JavierLopezT commented on pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
JavierLopezT commented on pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#issuecomment-855771920


   Great feature. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#issuecomment-997648974


   @JPonte there are still some failures and conflicts. can you take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #16264: Add flag to delete local logs after upload

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #16264:
URL: https://github.com/apache/airflow/pull/16264


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org