You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/26 04:56:17 UTC

[GitHub] [airflow] uranusjr commented on a change in pull request #19027: Fix for DockerOperator Xcoms functionality

uranusjr commented on a change in pull request #19027:
URL: https://github.com/apache/airflow/pull/19027#discussion_r792313131



##########
File path: airflow/providers/docker/operators/docker.py
##########
@@ -304,21 +304,24 @@ def _run_image_with_mounts(self, target_mounts, add_tmp_variable: bool) -> Optio
             working_dir=self.working_dir,
             tty=self.tty,
         )
-        lines = self.cli.attach(container=self.container['Id'], stdout=True, stderr=True, stream=True)
+        logstream = self.cli.attach(container=self.container['Id'], stdout=True, stderr=True, stream=True)
         try:
             self.cli.start(self.container['Id'])
 
-            line = ''
+            log_chunk = ''
             res_lines = []
             return_value = None
-            for line in lines:
-                if hasattr(line, 'decode'):
-                    # Note that lines returned can also be byte sequences so we have to handle decode here
-                    line = line.decode('utf-8')
-                line = line.strip()
-                res_lines.append(line)
-                self.log.info(line)
+            for log_chunk in logstream:
+                if hasattr(log_chunk, 'decode'):
+                    # Note that log_chunk returned can also be byte sequences so we have to handle decode here
+                    log_chunk = log_chunk.decode('utf-8')
+                log_chunk = log_chunk.strip()
+                res_lines.append(log_chunk)
+                self.log.info(log_chunk)
             result = self.cli.wait(self.container['Id'])
+            # after container has exited, grab the entire log ignoring the chunked log stream that was used with attach
+            # self.cli.logs uses docker's /containers/{id}/logs, while self.cli.attach uses /containers/{id}/attach
+            lines = self.cli.logs(container=self.container['Id'], stdout=True, stderr=True, stream=True)

Review comment:
       `logstream` only contains logs starting from the attach happens and may lose things between it and container startup. I’m not sure if it makes a difference, but honestly the additional API call is pretty cheap anyway, and might actually be more efficient with `xcom_all = False` because we don’t need to put all the previous logs into memory.

##########
File path: airflow/providers/docker/operators/docker.py
##########
@@ -304,21 +304,24 @@ def _run_image_with_mounts(self, target_mounts, add_tmp_variable: bool) -> Optio
             working_dir=self.working_dir,
             tty=self.tty,
         )
-        lines = self.cli.attach(container=self.container['Id'], stdout=True, stderr=True, stream=True)
+        logstream = self.cli.attach(container=self.container['Id'], stdout=True, stderr=True, stream=True)
         try:
             self.cli.start(self.container['Id'])
 
-            line = ''
+            log_chunk = ''
             res_lines = []
             return_value = None
-            for line in lines:
-                if hasattr(line, 'decode'):
-                    # Note that lines returned can also be byte sequences so we have to handle decode here
-                    line = line.decode('utf-8')
-                line = line.strip()
-                res_lines.append(line)
-                self.log.info(line)
+            for log_chunk in logstream:
+                if hasattr(log_chunk, 'decode'):
+                    # Note that log_chunk returned can also be byte sequences so we have to handle decode here
+                    log_chunk = log_chunk.decode('utf-8')
+                log_chunk = log_chunk.strip()
+                res_lines.append(log_chunk)
+                self.log.info(log_chunk)
             result = self.cli.wait(self.container['Id'])
+            # after container has exited, grab the entire log ignoring the chunked log stream that was used with attach
+            # self.cli.logs uses docker's /containers/{id}/logs, while self.cli.attach uses /containers/{id}/attach
+            lines = self.cli.logs(container=self.container['Id'], stdout=True, stderr=True, stream=True)

Review comment:
       `logstream` only contains logs starting from the attach happens and may lose things between it and container startup. I’m not sure if it makes a difference, but honestly the additional API call is pretty cheap anyway, and might actually be more efficient for `xcom_all=False` because we don’t need to put all the previous logs into memory.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org