You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/02 14:18:45 UTC

[GitHub] [airflow] jimmycfa edited a comment on issue #16763: SagemakerProcessingOperator ThrottlingException

jimmycfa edited a comment on issue #16763:
URL: https://github.com/apache/airflow/issues/16763#issuecomment-873010485


   For posterity sake we are using boto3==1.17.99. This actually appears to be an issue with the way that NameContains filter gets applied:
   
   The NameContains is getting passed into list_processing_jobs but it doesn't actually filter on the entire set of ProcessingJobs. It appears to filter per batch of 100 so you still end up calling the list_processing_jobs in that SagemakerOperator 30+ times back to back. Another way of saying this is if I specify NameContains in the list_processing_jobs with a job name that doesn't exist and I have over 3500 processing jobs it will return an empty set of ProcessingJobSummaries BUT still includes a NextToken. It will do this 35 more times as the max results = 100 for that call and you likely run into Throttling issues.
   
   I believe the expected behavior of that boto3 call should be the NameContains filter should be being applied to the entire set of jobs and then returning results vs per batch so that the first call through returns an empty set for ProcessingJobSummaries and NO NextToken.
   
   I'm going to reopen but this does appear to be a boto3 issue.
   
   Our current workaround was to update the `aws_default` connection in Admin->Connections and add the following to Extr:
   ```json
   {
      "config_kwargs":{
         "retries":{
            "max_attempts":10,
            "mode":"standard"
         }
      }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org