You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/05/26 15:29:05 UTC

[GitHub] [airflow] flolas opened a new pull request #16002: fix remote logging when blob already exists

flolas opened a new pull request #16002:
URL: https://github.com/apache/airflow/pull/16002


   Fix the append mode for wasb remote logging.
   
   closes: #15907
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r639978127



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -176,19 +178,18 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
         :type log: str
         :param remote_log_location: the log's location in remote storage
         :type remote_log_location: str (path)
-        :param append: if False, any existing log file is overwritten. If True,
+        :param append: if False, any existing log file is overwritten with the new log. If True,

Review comment:
       Much better !




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-852462319


   @potiuk done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #16002: fix remote logging when blob already exists using append_block

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r637637964



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -180,15 +180,20 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
             the new log is appended to any existing logs.
         :type append: bool
         """
-        if append and self.wasb_log_exists(remote_log_location):
-            old_log = self.wasb_read(remote_log_location)
-            log = '\n'.join([old_log, log]) if old_log else log
-
         try:
-            self.hook.load_string(
-                log,
-                self.wasb_container,
-                remote_log_location,
-            )
+            if append and self.wasb_log_exists(remote_log_location):
+                self.hook.append_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location
+                )
+            else:
+                self.hook.load_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location,
+                    blob_type = 'AppendBlob',
+                    overwrite=True
+                )

Review comment:
       > or simply, use the read-overwrite method based on BlockBlob
   
   I prefer the above with the `overwrite` being an argument to the wasb_write method so it's not hardcoded.
   Also, see https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas closed pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas closed pull request #16002:
URL: https://github.com/apache/airflow/pull/16002


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r639843974



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -176,19 +178,18 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
         :type log: str
         :param remote_log_location: the log's location in remote storage
         :type remote_log_location: str (path)
-        :param append: if False, any existing log file is overwritten. If True,
+        :param append: if False, any existing log file is overwritten with the new log. If True,

Review comment:
       What happens when:
   
   1) log file exists
   2) Append = false
   3) Overwrite = false
   
   I think it's not really obvious and help is not helpful. Could we clarify that in the description of the parameters ?
   ?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-853694386


   And you might need to rebase as well. We had changed HEAD branch of Airflow to main and rebase might be needed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-850427306


   @ephraimbuddy done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r639843974



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -176,19 +178,18 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
         :type log: str
         :param remote_log_location: the log's location in remote storage
         :type remote_log_location: str (path)
-        :param append: if False, any existing log file is overwritten. If True,
+        :param append: if False, any existing log file is overwritten with the new log. If True,

Review comment:
       What happens when:
   
   1) log file exists
   2) Append = false
   3) Overwrite = false
   
   I think it's not really obvious and help is not really helpful in this situation. Could you please clarify that in the description of the parameters ?
   ?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-853694386


   And you might need to rebase as well. We had changed HEAD branch of Airflow to main and rebase might be needed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #16002: fix remote logging when blob already exists using append_block

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r637624343



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -180,15 +180,20 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
             the new log is appended to any existing logs.
         :type append: bool
         """
-        if append and self.wasb_log_exists(remote_log_location):
-            old_log = self.wasb_read(remote_log_location)
-            log = '\n'.join([old_log, log]) if old_log else log
-
         try:
-            self.hook.load_string(
-                log,
-                self.wasb_container,
-                remote_log_location,
-            )
+            if append and self.wasb_log_exists(remote_log_location):
+                self.hook.append_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location
+                )
+            else:
+                self.hook.load_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location,
+                    blob_type = 'AppendBlob',
+                    overwrite=True
+                )

Review comment:
       or simply, use the read-overwrite method based on BlobkBlob....

##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -180,15 +180,20 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
             the new log is appended to any existing logs.
         :type append: bool
         """
-        if append and self.wasb_log_exists(remote_log_location):
-            old_log = self.wasb_read(remote_log_location)
-            log = '\n'.join([old_log, log]) if old_log else log
-
         try:
-            self.hook.load_string(
-                log,
-                self.wasb_container,
-                remote_log_location,
-            )
+            if append and self.wasb_log_exists(remote_log_location):
+                self.hook.append_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location
+                )
+            else:
+                self.hook.load_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location,
+                    blob_type = 'AppendBlob',
+                    overwrite=True
+                )

Review comment:
       or simply, use the read-overwrite method based on BlockBlob....




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-848313344


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-849474194


   > The S3 logger appends if the object already exists, so that is what we should do here too.
   
   Fully agree. I changed my mind.
   
   As @jhtimmins mentioned, two parameters are confusing. Append when exists is the best behaviour actually. Less flexibility is more in this case. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #16002: fix remote logging when blob already exists using append_block

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r637623609



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -180,15 +180,20 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
             the new log is appended to any existing logs.
         :type append: bool
         """
-        if append and self.wasb_log_exists(remote_log_location):
-            old_log = self.wasb_read(remote_log_location)
-            log = '\n'.join([old_log, log]) if old_log else log
-
         try:
-            self.hook.load_string(
-                log,
-                self.wasb_container,
-                remote_log_location,
-            )
+            if append and self.wasb_log_exists(remote_log_location):
+                self.hook.append_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location
+                )
+            else:
+                self.hook.load_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location,
+                    blob_type = 'AppendBlob',
+                    overwrite=True
+                )

Review comment:
       @ephraimbuddy Maybe handling that case in the exception and converting to AppendBlob? What do u think? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-849451480


   > > If `append=False, overwrite=False` is invalid, and `append=True, overwrite=True` simply means append, I would just have one `append` kwargs (i.e. dont add any new arguments), and have `append=False` automatically overwrite. This should be good enough as long as we document the behaivour.
   > 
   > Good. @flolas make this change.
   > @jhtimmins @uranusjr part of my worry in this PR is whether we should really overwrite? Will it come to users as surprise?
   
   The S3 logger appends if the object already exists, so that is what we should do here too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #16002:
URL: https://github.com/apache/airflow/pull/16002


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #16002: fix remote logging when blob already exists using append_block

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r637619547



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -180,15 +180,20 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
             the new log is appended to any existing logs.
         :type append: bool
         """
-        if append and self.wasb_log_exists(remote_log_location):
-            old_log = self.wasb_read(remote_log_location)
-            log = '\n'.join([old_log, log]) if old_log else log
-
         try:
-            self.hook.load_string(
-                log,
-                self.wasb_container,
-                remote_log_location,
-            )
+            if append and self.wasb_log_exists(remote_log_location):
+                self.hook.append_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location
+                )
+            else:
+                self.hook.load_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location,
+                    blob_type = 'AppendBlob',
+                    overwrite=True
+                )

Review comment:
       This looks like a breaking change. Take, for example, you have a BlockBlob log already in the container and you want to append to it, won't it be rejected since this is AppendBlob?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #16002: fix remote logging when blob already exists using append_block

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r637618950



##########
File path: airflow/providers/microsoft/azure/CHANGELOG.rst
##########
@@ -18,6 +18,13 @@
 
 Changelog
 ---------
+2.0.1
+.....
+
+Bug fixes
+~~~~~~~~~
+
+* ``BugFix: Fix remote log in azure storage blob when blob already exists (#15907)``

Review comment:
       You don't need to add the changelog. Providers changelogs are generated automatically when they are to be released




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-853859138


   @potiuk @uranusjr rebased and test are running again. 
   "Wait ffor CI images" log:
   ```
   {"errors":[{"code":"MANIFEST_UNKNOWN","message":"Docker image reference main-python3.6-ci-v2:52d61b9202fe994d19b37a84175fcd98a46a70be not found under repo \"apache\""}]}
   2440
   Still waiting. Status code 404!
   ```
   
   Maybe am I doing something wrong?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r639843974



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -176,19 +178,18 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
         :type log: str
         :param remote_log_location: the log's location in remote storage
         :type remote_log_location: str (path)
-        :param append: if False, any existing log file is overwritten. If True,
+        :param append: if False, any existing log file is overwritten with the new log. If True,

Review comment:
       What happens when:
   
   1) log file exists
   2) Append = false
   3) Overwrite = false
   
   I think it's not really obvious and help is not really helpful in this situation. Could you clarify that in the description of the parameters ?
   ?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-853678827


   GitHub had an outage yesterday; things should be back on now. https://www.githubstatus.com/incidents/76nv9h8pmkv4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-853678827


   GitHub had an outage yesterday; things should be back on now. https://www.githubstatus.com/incidents/76nv9h8pmkv4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r640248752



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -176,19 +178,18 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
         :type log: str
         :param remote_log_location: the log's location in remote storage
         :type remote_log_location: str (path)
-        :param append: if False, any existing log file is overwritten. If True,
-            the new log is appended to any existing logs.
+        :param append: if False and overwrite = True, any existing log file is overwritten with the new log.
+                       If True and overwrite = True, the new log is appended to any existing logs.
+                       If True and overwrite = False the method will fail
         :type append: bool
+        :param overwrite: if False, does not overwrite log file. If the file already exists the method will fail
+                          if True, any existing log file is overwritten.
+        :type overwrite: bool

Review comment:
       Valid states are:
   
   (append, overwrite)
   (True, True) 
   (False, True)
   (False, False) 
   
   But i dont know if we have a pattern for state (false, false). In the context of logs if we have the combination (false, false), when the log exists the method will fail 
   @ephraimbuddy what do u think? do we really need the separation or use only one argument? maybe always using overwrite =True into wasb_write method? If you look at s3 log handler, it has the same behaviour (replace=True)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-853260797


   @potiuk is there a problem with CI Build / Wait for CI images (pull_request) ? always fails in tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r639971803



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -176,19 +178,18 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
         :type log: str
         :param remote_log_location: the log's location in remote storage
         :type remote_log_location: str (path)
-        :param append: if False, any existing log file is overwritten. If True,
+        :param append: if False, any existing log file is overwritten with the new log. If True,

Review comment:
       @potiuk 
   Is okay now?
   ```
           :param append: if False and overwrite = True, any existing log file is overwritten with the new log.
                          If True and overwrite = True, the new log is appended to any existing logs.
                          If True and overwrite = False the method will fail
           :type append: bool
           :param overwrite: if False, does not overwrite log file. If the file already exists the method will fail
                             if True, any existing log file is overwritten.
           :type overwrite: bool
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-850532124


   @potiuk Can you take a look now? thanks. The behaviour implemented is the same as s3 task log handler. thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-848598592


   @flolas please rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-849369596


   > If `append=False, overwrite=False` is invalid, and `append=True, overwrite=True` simply means append, I would just have one `append` kwargs (i.e. dont add any new arguments), and have `append=False` automatically overwrite. This should be good enough as long as we document the behaivour.
   
   Good. @flolas make this change. 
   @jhtimmins @uranusjr part of my worry in this PR is whether we should really overwrite? Will it come to users as surprise? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r639894494



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -176,19 +178,18 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
         :type log: str
         :param remote_log_location: the log's location in remote storage
         :type remote_log_location: str (path)
-        :param append: if False, any existing log file is overwritten. If True,
+        :param append: if False, any existing log file is overwritten with the new log. If True,

Review comment:
       @potiuk yeah, np, will post asap here. thx




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-853859138


   @potiuk @uranusjr rebased and test are running again. 
   "Wait ffor CI images" log:
   ```
   {"errors":[{"code":"MANIFEST_UNKNOWN","message":"Docker image reference main-python3.6-ci-v2:52d61b9202fe994d19b37a84175fcd98a46a70be not found under repo \"apache\""}]}
   2440
   Still waiting. Status code 404!
   ```
   
   Maybe am I doing something wrong?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-849286513


   If `append=False, overwrite=False` is invalid, and `append=True, overwrite=True` simply means append, I would just have one `append` kwargs (i.e. dont add any new arguments), and have `append=False` automatically overwrite. This should be good enough as long as we document the behaivour.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #16002: fix remote logging when blob already exists using append_block

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r637623609



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -180,15 +180,20 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
             the new log is appended to any existing logs.
         :type append: bool
         """
-        if append and self.wasb_log_exists(remote_log_location):
-            old_log = self.wasb_read(remote_log_location)
-            log = '\n'.join([old_log, log]) if old_log else log
-
         try:
-            self.hook.load_string(
-                log,
-                self.wasb_container,
-                remote_log_location,
-            )
+            if append and self.wasb_log_exists(remote_log_location):
+                self.hook.append_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location
+                )
+            else:
+                self.hook.load_string(
+                    log,
+                    self.wasb_container,
+                    remote_log_location,
+                    blob_type = 'AppendBlob',
+                    overwrite=True
+                )

Review comment:
       @ephraimbuddy Maybe handling that case in the exception and converting an BlockBlob to AppendBlob? What do u think? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-848267787


   @ephraimbuddy Have you had a chance to look at this yet? thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #16002: fix remote logging when blob already exists using append_block

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r637623862



##########
File path: airflow/providers/microsoft/azure/CHANGELOG.rst
##########
@@ -18,6 +18,13 @@
 
 Changelog
 ---------
+2.0.1
+.....
+
+Bug fixes
+~~~~~~~~~
+
+* ``BugFix: Fix remote log in azure storage blob when blob already exists (#15907)``

Review comment:
       Lol, will fix that too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-849111908


   @ephraimbuddy rebase ok


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas closed pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas closed pull request #16002:
URL: https://github.com/apache/airflow/pull/16002


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jhtimmins commented on a change in pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
jhtimmins commented on a change in pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#discussion_r640233019



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -176,19 +178,18 @@ def wasb_write(self, log: str, remote_log_location: str, append: bool = True) ->
         :type log: str
         :param remote_log_location: the log's location in remote storage
         :type remote_log_location: str (path)
-        :param append: if False, any existing log file is overwritten. If True,
-            the new log is appended to any existing logs.
+        :param append: if False and overwrite = True, any existing log file is overwritten with the new log.
+                       If True and overwrite = True, the new log is appended to any existing logs.
+                       If True and overwrite = False the method will fail
         :type append: bool
+        :param overwrite: if False, does not overwrite log file. If the file already exists the method will fail
+                          if True, any existing log file is overwritten.
+        :type overwrite: bool

Review comment:
       This behavior doesn't really make sense to me. Specifically, it seems like `append + overwrite == append`.
   
   More generally, I think that deriving behavior as the cross product of two different arguments is an antipattern. Is there a way that this can be simplified? Is `append=False && overwrite=False` a valid state? If not, and there are only two valid. states, then this can be simplified into one argument.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-852958999


   Reopening to rerun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas edited a comment on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
flolas edited a comment on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-850532124


   @potiuk could you take a look now? thanks. The behaviour implemented is the same as s3 task log handler. thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16002: fix remote logging when blob already exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16002:
URL: https://github.com/apache/airflow/pull/16002#issuecomment-851556006


   Can you please rebase so that we can run the tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org