You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/08/16 17:43:22 UTC

[GitHub] [airflow] ashb commented on a change in pull request #5825: [AIRFLOW-5218] less polling for AWS Batch status

ashb commented on a change in pull request #5825: [AIRFLOW-5218] less polling for AWS Batch status
URL: https://github.com/apache/airflow/pull/5825#discussion_r314822320
 
 

 ##########
 File path: airflow/contrib/operators/awsbatch_operator.py
 ##########
 @@ -133,16 +134,25 @@ def _wait_for_task_ended(self):
             retry = True
             retries = 0
 
-            while retries < self.max_retries and retry:
-                self.log.info('AWS Batch retry in the next %s seconds', retries)
-                response = self.client.describe_jobs(
-                    jobs=[self.jobId]
-                )
-                if response['jobs'][-1]['status'] in ['SUCCEEDED', 'FAILED']:
+            # Allow a batch job a minute to spin up.  A random interval
+            # decreases the chances of exceeding an AWS API throttle
+            # limit when there are many concurrent tasks.
+            sleep(randint(10, 60))
+
+            while True:
+                response = self.client.describe_jobs(jobs=[self.jobId])
+                status = response['jobs'][-1]['status']
+                self.log.info('AWS Batch status: %s', status)
+                if status in ['SUCCEEDED', 'FAILED']:
                     retry = False
 
 Review comment:
   Since we are making changes here, this loop could be tided up a bit.
   
   ```python
               while retries < self.max_retries and retry:
                   response = self.client.describe_jobs(jobs=[self.jobId])
                   status = response['jobs'][-1]['status']
                   self.log.info('AWS Batch status: %s', status)
                   if status in ['SUCCEEDED', 'FAILED']:
                       break
   
                   retries += 1
                   pause = 1 + pow(retries * 0.3, 2)
                   self.log.info('AWS Batch status check (%d) in the next %.2f seconds', retries, pause)
                   sleep(pause)
   ```
   
   It also feels _very_ odd that if we run out of retries while the task is still running that this just falls out the bottom of the loop!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services