You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/01 08:47:44 UTC

[GitHub] [airflow] potiuk opened a new issue #14539: Sometimes the runners have difficulty in starting in 10 minutes

potiuk opened a new issue #14539:
URL: https://github.com/apache/airflow/issues/14539


   There are sometime cases when self-hosted runners will not start in 10 minutes and such builds will fail.
   
   Example here: 
   
   https://github.com/apache/airflow/actions/runs/608613545
   
   Those annotations are shown:
   
   ```
   
   The job running on runner Airflow Runner 82 has exceeded the maximum execution time of 10 minutes. | Cancel workflow runs
   -- | --
   The job running on runner Airflow Runner 82 has exceeded the maximum execution time of 10 minutes. | Cancel workflow runs
   Error when evaluating 'runs-on' for job 'cancel-on-build-failure'. (Line: 543, Col: 14): Error reading JToken from JsonReader. Path '', line 0, position 0.,(Line: 543, Col: 14): Unexpected value '' | Build ImagesĀ : .github#L1
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #14539: Sometimes the self-hosted runners have difficulty in starting in 10 minutes

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #14539:
URL: https://github.com/apache/airflow/issues/14539#issuecomment-787825750


   So the timestamps on GitHub for that action is 2021-02-28T21:43Z, but looking in the logs for "cancel-worfklow-runs" via CloudWatch Logs insights:
   
   (AWS creds needed for  the link, and logs only kep for ~5 days)  I see the following log messages (which is across _all_ runners.)
   
   [**CloudWatch Logs Insights**
   region: eu-central-1
   log-group-names: GitHubRunners
   start-time: 2021-02-28T21:00:00.000Z
   end-time: 2021-02-28T23:59:59.000Z
   ](https://eu-central-1.console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logsV2:logs-insights$3FqueryDetail$3D$257E$2528end$257E$25272021-02-28T23*3a59*3a59.000Z$257Estart$257E$25272021-02-28T21*3a00*3a00.000Z$257EtimeType$257E$2527ABSOLUTE$257Etz$257E$2527Local$257EeditorString$257E$2527fields*20*40timestamp*2c*20*40logStream*2c*20message*0a*7c*20filter*20message*20like*20*27Cancel*20workflow*20runs*27*0a*7c*20sort*20*40timestamp*20desc*0a*7c*20limit*2010000$257EisLiveTail$257Efalse$257EqueryId$257E$2527582c4f04-c056-4614-97cc-16916474edde$257Esource$257E$2528$257E$2527GitHubRunners$2529$2529)
   
   query-string:
   ```
   fields @timestamp, @logStream, message
   | filter message like 'Cancel workflow runs'
   | sort @timestamp desc
   | limit 10000
   ```
   This doesn't show a cancel job within 15 minutes of the failed 11 min job :(
   
   I don't know _what_ has gone on.
   
   ------------------
   |       @timestamp        |    @logStream    |  message |
   |-------------------------|------------------|------------------------------------------|
   | 2021-02-28 22:46:11.607 | ip-172-31-42-165 | 2021-02-28 22:46:11Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 22:46:11.000 | ip-172-31-42-165 | WRITE LINE: 2021-02-28 22:46:11Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 22:45:46.505 | ip-172-31-42-165 | 2021-02-28 22:45:46Z: Running job: Cancel workflow runs |
   | 2021-02-28 22:45:46.000 | ip-172-31-42-165 | WRITE LINE: 2021-02-28 22:45:46Z: Running job: Cancel workflow runs |
   | 2021-02-28 22:29:37.157 | ip-172-31-39-0   | 2021-02-28 22:29:37Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 22:29:37.000 | ip-172-31-39-0   | WRITE LINE: 2021-02-28 22:29:37Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 22:29:10.306 | ip-172-31-39-0   | 2021-02-28 22:29:10Z: Running job: Cancel workflow runs |
   | 2021-02-28 22:29:10.000 | ip-172-31-39-0   | WRITE LINE: 2021-02-28 22:29:10Z: Running job: Cancel workflow runs |
   | 2021-02-28 22:27:59.718 | ip-172-31-34-137 | 2021-02-28 22:27:59Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 22:27:59.000 | ip-172-31-34-137 | WRITE LINE: 2021-02-28 22:27:59Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 22:27:36.313 | ip-172-31-34-137 | 2021-02-28 22:27:36Z: Running job: Cancel workflow runs |
   | 2021-02-28 22:27:36.000 | ip-172-31-34-137 | WRITE LINE: 2021-02-28 22:27:36Z: Running job: Cancel workflow runs |
   | 2021-02-28 22:25:36.527 | ip-172-31-22-230 | 2021-02-28 22:25:36Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 22:25:36.000 | ip-172-31-22-230 | WRITE LINE: 2021-02-28 22:25:36Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 22:24:48.876 | ip-172-31-22-230 | 2021-02-28 22:24:48Z: Running job: Cancel workflow runs |
   | 2021-02-28 22:24:48.000 | ip-172-31-22-230 | WRITE LINE: 2021-02-28 22:24:48Z: Running job: Cancel workflow runs |
   | 2021-02-28 22:08:29.700 | ip-172-31-45-136 | 2021-02-28 22:08:29Z: Running job: Cancel workflow runs |
   | 2021-02-28 22:08:29.000 | ip-172-31-45-136 | WRITE LINE: 2021-02-28 22:08:29Z: Running job: Cancel workflow runs |
   | 2021-02-28 21:18:23.051 | ip-172-31-19-206 | 2021-02-28 21:18:23Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 21:18:23.000 | ip-172-31-19-206 | WRITE LINE: 2021-02-28 21:18:23Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 21:17:57.453 | ip-172-31-19-206 | 2021-02-28 21:17:57Z: Running job: Cancel workflow runs |
   | 2021-02-28 21:17:57.000 | ip-172-31-19-206 | WRITE LINE: 2021-02-28 21:17:57Z: Running job: Cancel workflow runs |
   | 2021-02-28 21:13:53.575 | ip-172-31-31-247 | 2021-02-28 21:13:53Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 21:13:53.000 | ip-172-31-31-247 | WRITE LINE: 2021-02-28 21:13:53Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 21:13:29.928 | ip-172-31-31-247 | 2021-02-28 21:13:29Z: Running job: Cancel workflow runs |
   | 2021-02-28 21:13:29.000 | ip-172-31-31-247 | WRITE LINE: 2021-02-28 21:13:29Z: Running job: Cancel workflow runs |
   | 2021-02-28 21:07:39.117 | ip-172-31-47-152 | 2021-02-28 21:07:39Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 21:07:39.000 | ip-172-31-47-152 | WRITE LINE: 2021-02-28 21:07:39Z: Job Cancel workflow runs completed with result: Succeeded |
   | 2021-02-28 21:07:15.238 | ip-172-31-47-152 | 2021-02-28 21:07:15Z: Running job: Cancel workflow runs |
   | 2021-02-28 21:07:15.000 | ip-172-31-47-152 | WRITE LINE: 2021-02-28 21:07:15Z: Running job: Cancel workflow runs |
   ------
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #14539: Sometimes the self-hosted runners have difficulty in starting in 10 minutes

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #14539:
URL: https://github.com/apache/airflow/issues/14539


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org