You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Tyson Hamilton <ty...@google.com> on 2021/07/01 18:13:31 UTC

Jenkins Disk Utilization Alerts

Hi All,

A bunch of our Jenkins jobs were failing because of full disks. It turns
out the inventory jobs were not running, they were stuck for 8 days since
we had that issue with no-agents being present in the Jenkins cluster.
Those inventory jobs clean up the disk space on the Jenkins agent VMs.

Ahmet helped fix the issue by cancelling stuck inventory jobs and rerunning
them manually which requires a Jenkins login. Thanks Ahmet!

To avoid this type of issue in the future, or at least catch it earlier,
I've created an alert in GCP to send an email to this group when
utilization is >85% [1]. The alert provides some suggestions on how to fix
the issue, I'll also update the cwiki with that information.

We can adjust this threshold if it becomes too spammy. It may take some
time to settle. I'd like to recommend that whenever somebody decides to
take action on the alert, reply to the thread, so that we can minimize the
number of people investigating at the same time.

[1]: https://photos.app.goo.gl/DYWcG1UwcgzEfbmCA

-Tyson