You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/21 08:23:24 UTC

[GitHub] [airflow] akki commented on a change in pull request #9464: Fix DockerOperator xcom

akki commented on a change in pull request #9464:
URL: https://github.com/apache/airflow/pull/9464#discussion_r509083348



##########
File path: airflow/providers/docker/operators/docker.py
##########
@@ -256,29 +257,34 @@ def _run_image(self) -> Optional[str]:
 
             lines = self.cli.attach(container=self.container['Id'], stdout=True, stderr=True, stream=True)
 
-            self.cli.start(self.container['Id'])
+            def gen_output(stdout=False, stderr=False):
+                return (

Review comment:
       Hi
   
   I don't think using a generator instead of a list here solves the memory issue. The data will in the end be kept in memory - no matter you use a list or a generator.
   I did a small test to verify this; I ran the following code in a Python3 terminal:
   ```
   >>> with open('yes.log', 'r') as file_:
   ...   x = (linee for linee in file_.read())
   ...
   ```
   where `yes.log` was a 650 MB file.
   As soon as the execution of this code-block completed, I saw the memory used by this process increase by 650 MB. It makes sense as well because where else would generators be storing these logs if not in memory.
   
   I am thinking that you might be able to achieve what you're trying to do by streaming the logs and using `yield`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org