You are viewing a plain text version of this content. The canonical link for it is here.

Posted to builds@apache.org by Joan Touzet <wo...@apache.org> on 2018/07/14 05:29:52 UTC

Jenkins build hosts filling up...needs everyone's help!

Hi there,

Chris over in  https://issues.apache.org/jira/browse/INFRA-16768
recommended I start a thread.

We've been getting increasing numbers of failures on our builds
due to nodes running out of disk space. In 16768, Chris says:

"We are getting to the point where builds are running machines out of space faster than we can clear them out. I've cleaned up H24 a bit. We'll discuss further, but this is going to take some cooperation amongst all builders to start purging workspaces."

So this is the requested thread. What do we, collectively, as
Jenkins users, need to do to clean out workspaces? I'm
fairly sure that CouchDB workspaces are pretty clean, since we
do the bulk of our builds in /tmp and try our best to clean up
after failed builds. But I am happy to admit I don't know
everything I should or shouldn't be doing.

Chris, would a list of the "top offenders" be useful? I'm not
looking to shame anyone, but shedding a little light on the
approaches that are the biggest problem might help.

-Joan

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Karl Heinz Marbaise <kh...@gmx.de>.

Hi,

based on this thread I've came into the situation to be blocked by a 
filled up Windows Server 
(https://builds.apache.org/computer/windows-2012-1/) which is filled up 
again..

So I have taken the liberty to wipe out the workspaces of the following 
projects:

flex-flexunit (maven)
flex-flexunit (maven)
flex-sdk-converter (maven)
flex-tool-api (maven)
Mesos-Reviewbot-Windows(No workspace there)
Mesos-Windows
Royale-asjs
Royale-compiler
Royale-typedefs


Kind regards
Karl Heinz Marbaise
Apache Maven PMC


On 14/07/18 12:00, Karl Heinz Marbaise wrote:
> Hi Joan,
> 
> thanks for starting such thread...
> 
> I've seen also many issues about that...and the Maven Team is been 
> affected by them...
> 
> and I have started to change our global Jenkins lib to handle that...
> cause I think we have many build so in the end we will consume a lot of 
> disk space.
> 
> Based on the issues I have done some calculations for our Maven builds..
> 
> We have 96 Components/Plugins etc. which means first 96 Jobs (multi 
> branch pipelines)..
> 
> At least a master for each of those above things which means lowest 
> number of builds is 96 (lets say for the sake of convenience 100) ...but 
> for each of them we are building
> 
> with JDK 7, JDK 8, JDK 9 and JDK 10 (see [2]) which results at minimum 
> into:
> 
> 4 * 100 Builds (workspaces)...and that's not all...
> 
> So I think it would be great to have such list of disk space usage on 
> the nodes related to the projects/jobs ...
> 
> @INFRA
> Is there a way I could get this information on my own (without bothering 
> INFRA team) to see if my changes are helping ?
> 
> Kind regards
> Karl Heinz Marbaise
> Apache Maven PMC
> 
> 
> [1]: https://builds.apache.org/view/M-R/view/Maven/job/maven-box/
> [2]: 
> https://builds.apache.org/view/M-R/view/Maven/job/maven-box/job/maven-acr-plugin/job/master/workflow-stage/ 
> 
> 
> 
> On 14/07/18 07:29, Joan Touzet wrote:
>> Hi there,
>>
>> Chris over in  https://issues.apache.org/jira/browse/INFRA-16768
>> recommended I start a thread.
>>
>> We've been getting increasing numbers of failures on our builds
>> due to nodes running out of disk space. In 16768, Chris says:
>>
>> "We are getting to the point where builds are running machines out of 
>> space faster than we can clear them out. I've cleaned up H24 a bit. 
>> We'll discuss further, but this is going to take some cooperation 
>> amongst all builders to start purging workspaces."
>>
>> So this is the requested thread. What do we, collectively, as
>> Jenkins users, need to do to clean out workspaces? I'm
>> fairly sure that CouchDB workspaces are pretty clean, since we
>> do the bulk of our builds in /tmp and try our best to clean up
>> after failed builds. But I am happy to admit I don't know
>> everything I should or shouldn't be doing.
>>
>> Chris, would a list of the "top offenders" be useful? I'm not
>> looking to shame anyone, but shedding a little light on the
>> approaches that are the biggest problem might help.
>>
>> -Joan
>>

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Robert Munteanu <ro...@apache.org>.

On Sat, 2018-07-14 at 12:00 +0200, Karl Heinz Marbaise wrote:
> So I think it would be great to have such list of disk space usage
> on 
> the nodes related to the projects/jobs ...

+1

We in the Apache Sling project also have a large number of projects
(276), but I expect the workspace to be small. Having a list of disk
usage by job would allow us to understand whether we need to change
anything or not.

Thanks,

Robert

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Karl Heinz Marbaise <kh...@gmx.de>.

Hi Joan,

thanks for starting such thread...

I've seen also many issues about that...and the Maven Team is been 
affected by them...

and I have started to change our global Jenkins lib to handle that...
cause I think we have many build so in the end we will consume a lot of 
disk space.

Based on the issues I have done some calculations for our Maven builds..

We have 96 Components/Plugins etc. which means first 96 Jobs (multi 
branch pipelines)..

At least a master for each of those above things which means lowest 
number of builds is 96 (lets say for the sake of convenience 100) ...but 
for each of them we are building

with JDK 7, JDK 8, JDK 9 and JDK 10 (see [2]) which results at minimum into:

4 * 100 Builds (workspaces)...and that's not all...

So I think it would be great to have such list of disk space usage on 
the nodes related to the projects/jobs ...

@INFRA
Is there a way I could get this information on my own (without bothering 
INFRA team) to see if my changes are helping ?

Kind regards
Karl Heinz Marbaise
Apache Maven PMC


[1]: https://builds.apache.org/view/M-R/view/Maven/job/maven-box/
[2]: 
https://builds.apache.org/view/M-R/view/Maven/job/maven-box/job/maven-acr-plugin/job/master/workflow-stage/


On 14/07/18 07:29, Joan Touzet wrote:
> Hi there,
> 
> Chris over in  https://issues.apache.org/jira/browse/INFRA-16768
> recommended I start a thread.
> 
> We've been getting increasing numbers of failures on our builds
> due to nodes running out of disk space. In 16768, Chris says:
> 
> "We are getting to the point where builds are running machines out of space faster than we can clear them out. I've cleaned up H24 a bit. We'll discuss further, but this is going to take some cooperation amongst all builders to start purging workspaces."
> 
> So this is the requested thread. What do we, collectively, as
> Jenkins users, need to do to clean out workspaces? I'm
> fairly sure that CouchDB workspaces are pretty clean, since we
> do the bulk of our builds in /tmp and try our best to clean up
> after failed builds. But I am happy to admit I don't know
> everything I should or shouldn't be doing.
> 
> Chris, would a list of the "top offenders" be useful? I'm not
> looking to shame anyone, but shedding a little light on the
> approaches that are the biggest problem might help.
> 
> -Joan
>

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Dan Kirkwood <da...@gmail.com>.

Hi Gav,

I'd certainly appreciate any help in that area.   I'm guessing those are
the build artifact rpms?  In that case,  it should not be keeping them
around for more than a few days as this job is only to vet new PRs.

Do I have the ability to examine those areas?   I do have it set to discard
old builds more than 5 days old.   I believe the artifacts should be
deleted with them.

-dan


On Fri, Jul 20, 2018 at 6:26 AM Gav <ip...@gmail.com> wrote:

> Actually Dan, in the 'workspace' area of at least one node I found :-
>
> $JENKINS_HOME/jenkins-slave/workspace/trafficcontrol-PR_ws-space*
>
> The size averaged around 350MB each, not big in itself but when you
> mulitply that by 6750
> such directories that make over 100GB.
>
> As such, I'd like to understand what these directories are, how they are
> being made and
> what the job configs are.
>
> Thanks
>
>
>
> On Fri, Jul 20, 2018 at 3:36 AM Dan Kirkwood <da...@apache.org> wrote:
>
> > TrafficControl uses docker-compose to build each component,  so the disk
> > space used is within docker's space -- not the workspace.   We do attempt
> > to clean up after each build,  but would be happy to get advice to
> improve
> > how we're doing it.   Are there best practices others use?
> >
> > thanks..   Dan Kirkwood
> >
> >
> > On 2018/07/14 05:29:52, Joan Touzet <wo...@apache.org> wrote:
> > > Hi there,
> > >
> > > Chris over in  https://issues.apache.org/jira/browse/INFRA-16768
> > > recommended I start a thread.
> > >
> > > We've been getting increasing numbers of failures on our builds
> > > due to nodes running out of disk space. In 16768, Chris says:
> > >
> > > "We are getting to the point where builds are running machines out of
> > space faster than we can clear them out. I've cleaned up H24 a bit. We'll
> > discuss further, but this is going to take some cooperation amongst all
> > builders to start purging workspaces."
> > >
> > > So this is the requested thread. What do we, collectively, as
> > > Jenkins users, need to do to clean out workspaces? I'm
> > > fairly sure that CouchDB workspaces are pretty clean, since we
> > > do the bulk of our builds in /tmp and try our best to clean up
> > > after failed builds. But I am happy to admit I don't know
> > > everything I should or shouldn't be doing.
> > >
> > > Chris, would a list of the "top offenders" be useful? I'm not
> > > looking to shame anyone, but shedding a little light on the
> > > approaches that are the biggest problem might help.
> > >
> > > -Joan
> > >
> >
>
>
> --
> Gav...
>

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Gav <ip...@gmail.com>.

Actually Dan, in the 'workspace' area of at least one node I found :-

$JENKINS_HOME/jenkins-slave/workspace/trafficcontrol-PR_ws-space*

The size averaged around 350MB each, not big in itself but when you
mulitply that by 6750
such directories that make over 100GB.

As such, I'd like to understand what these directories are, how they are
being made and
what the job configs are.

Thanks



On Fri, Jul 20, 2018 at 3:36 AM Dan Kirkwood <da...@apache.org> wrote:

> TrafficControl uses docker-compose to build each component,  so the disk
> space used is within docker's space -- not the workspace.   We do attempt
> to clean up after each build,  but would be happy to get advice to improve
> how we're doing it.   Are there best practices others use?
>
> thanks..   Dan Kirkwood
>
>
> On 2018/07/14 05:29:52, Joan Touzet <wo...@apache.org> wrote:
> > Hi there,
> >
> > Chris over in  https://issues.apache.org/jira/browse/INFRA-16768
> > recommended I start a thread.
> >
> > We've been getting increasing numbers of failures on our builds
> > due to nodes running out of disk space. In 16768, Chris says:
> >
> > "We are getting to the point where builds are running machines out of
> space faster than we can clear them out. I've cleaned up H24 a bit. We'll
> discuss further, but this is going to take some cooperation amongst all
> builders to start purging workspaces."
> >
> > So this is the requested thread. What do we, collectively, as
> > Jenkins users, need to do to clean out workspaces? I'm
> > fairly sure that CouchDB workspaces are pretty clean, since we
> > do the bulk of our builds in /tmp and try our best to clean up
> > after failed builds. But I am happy to admit I don't know
> > everything I should or shouldn't be doing.
> >
> > Chris, would a list of the "top offenders" be useful? I'm not
> > looking to shame anyone, but shedding a little light on the
> > approaches that are the biggest problem might help.
> >
> > -Joan
> >
>


-- 
Gav...

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Dominik Psenner <dp...@gmail.com>.

Would there be ways to automate a rm -rf of a build workspace when the
build has completed? Doing so would make the problem disappear. Surely
builds take a little more time and disk, network and cpu usage will grow.
But builds start from a pristine workspace  which is desirable from my pov
and the full disk problem disappears too so long a single job does not use
more than the available on a build machine.

On Fri, 20 Jul 2018, 19:52 Dan Kirkwood, <da...@apache.org> wrote:

> Thanks,  Mike..
>
> I've added some better cleanup,  I hope,  using trap -- so hopefully will
> lessen our contribution to the problem.
>
> -dan
>
> On 2018/07/20 16:43:31, Mike Jumper <mj...@apache.org> wrote:
> > On Thu, Jul 19, 2018, 10:36 Dan Kirkwood <da...@apache.org> wrote:
> >
> > > TrafficControl uses docker-compose to build each component,  so the
> disk
> > > space used is within docker's space -- not the workspace.   We do
> attempt
> > > to clean up after each build,  but would be happy to get advice to
> improve
> > > how we're doing it.   Are there best practices others use?
> > >
> >
> > We've been using Docker for Guacamole's Jenkins builds, as well,
> > autogenerating a Dockerfile which performs the actual build within a
> > pristine environment.
> >
> > Not sure how different things would need to be with docker-compose, but
> we
> > use a combination of the "trap" command (to ensure cleanup happens
> > regardless of build result) and the "--no-cache" and "--rm" flags:
> >
> >     # Remove image regardless of build result
> >     export TAG="guac-${BUILD_TAG}"
> >     trap "docker rmi --force $TAG || true" EXIT
> >
> >     # Perform build
> >     docker build --no-cache=true --rm --tag "$TAG" .
> >
> > - Mike
> >
>

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Dan Kirkwood <da...@apache.org>.

Thanks,  Mike..

I've added some better cleanup,  I hope,  using trap -- so hopefully will lessen our contribution to the problem.

-dan

On 2018/07/20 16:43:31, Mike Jumper <mj...@apache.org> wrote: 
> On Thu, Jul 19, 2018, 10:36 Dan Kirkwood <da...@apache.org> wrote:
> 
> > TrafficControl uses docker-compose to build each component,  so the disk
> > space used is within docker's space -- not the workspace.   We do attempt
> > to clean up after each build,  but would be happy to get advice to improve
> > how we're doing it.   Are there best practices others use?
> >
> 
> We've been using Docker for Guacamole's Jenkins builds, as well,
> autogenerating a Dockerfile which performs the actual build within a
> pristine environment.
> 
> Not sure how different things would need to be with docker-compose, but we
> use a combination of the "trap" command (to ensure cleanup happens
> regardless of build result) and the "--no-cache" and "--rm" flags:
> 
>     # Remove image regardless of build result
>     export TAG="guac-${BUILD_TAG}"
>     trap "docker rmi --force $TAG || true" EXIT
> 
>     # Perform build
>     docker build --no-cache=true --rm --tag "$TAG" .
> 
> - Mike
>

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Mike Jumper <mj...@apache.org>.

On Thu, Jul 19, 2018, 10:36 Dan Kirkwood <da...@apache.org> wrote:

> TrafficControl uses docker-compose to build each component,  so the disk
> space used is within docker's space -- not the workspace.   We do attempt
> to clean up after each build,  but would be happy to get advice to improve
> how we're doing it.   Are there best practices others use?
>

We've been using Docker for Guacamole's Jenkins builds, as well,
autogenerating a Dockerfile which performs the actual build within a
pristine environment.

Not sure how different things would need to be with docker-compose, but we
use a combination of the "trap" command (to ensure cleanup happens
regardless of build result) and the "--no-cache" and "--rm" flags:

    # Remove image regardless of build result
    export TAG="guac-${BUILD_TAG}"
    trap "docker rmi --force $TAG || true" EXIT

    # Perform build
    docker build --no-cache=true --rm --tag "$TAG" .

- Mike

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Dan Kirkwood <da...@apache.org>.

TrafficControl uses docker-compose to build each component,  so the disk space used is within docker's space -- not the workspace.   We do attempt to clean up after each build,  but would be happy to get advice to improve how we're doing it.   Are there best practices others use?

thanks..   Dan Kirkwood


On 2018/07/14 05:29:52, Joan Touzet <wo...@apache.org> wrote: 
> Hi there,
> 
> Chris over in  https://issues.apache.org/jira/browse/INFRA-16768
> recommended I start a thread.
> 
> We've been getting increasing numbers of failures on our builds
> due to nodes running out of disk space. In 16768, Chris says:
> 
> "We are getting to the point where builds are running machines out of space faster than we can clear them out. I've cleaned up H24 a bit. We'll discuss further, but this is going to take some cooperation amongst all builders to start purging workspaces."
> 
> So this is the requested thread. What do we, collectively, as
> Jenkins users, need to do to clean out workspaces? I'm
> fairly sure that CouchDB workspaces are pretty clean, since we
> do the bulk of our builds in /tmp and try our best to clean up
> after failed builds. But I am happy to admit I don't know
> everything I should or shouldn't be doing.
> 
> Chris, would a list of the "top offenders" be useful? I'm not
> looking to shame anyone, but shedding a little light on the
> approaches that are the biggest problem might help.
> 
> -Joan
>

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Gav <ip...@gmail.com>.

Yeah we have had this plugin in use for quite some time now here at the ASF
- every so often we reset
builds to a sensible level.

Gav...


On Wed, Jul 18, 2018 at 10:28 PM Robert Munteanu <ro...@apache.org> wrote:

> On Wed, 2018-07-18 at 13:49 +0200, Zoran Regvart wrote:
> > Hello,
> > I've been using Configuration Slicing plugin[1] for a good while now.
> > If we can agree on a discard policy in terms of days or number of
> > builds kept it's a easy way to enforce it, at least for non-pipeline
> > jobs.
>
> Note that some jobs - at least the Apache Sling ones - are
> automatically generated so any 'manual' changes will be overwritten.
>
> Robert
>
> > [1] https://plugins.jenkins.io/configurationslicing
>


-- 
Gav...

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Robert Munteanu <ro...@apache.org>.

On Wed, 2018-07-18 at 13:49 +0200, Zoran Regvart wrote:
> Hello,
> I've been using Configuration Slicing plugin[1] for a good while now.
> If we can agree on a discard policy in terms of days or number of
> builds kept it's a easy way to enforce it, at least for non-pipeline
> jobs.

Note that some jobs - at least the Apache Sling ones - are
automatically generated so any 'manual' changes will be overwritten.

Robert

> [1] https://plugins.jenkins.io/configurationslicing

Re: Jenkins build hosts filling up...needs everyone's help!

Posted by Zoran Regvart <zo...@regvart.com>.

Hello,
I've been using Configuration Slicing plugin[1] for a good while now.
If we can agree on a discard policy in terms of days or number of
builds kept it's a easy way to enforce it, at least for non-pipeline
jobs.

[1] https://plugins.jenkins.io/configurationslicing

On Sat, Jul 14, 2018 at 7:29 AM, Joan Touzet <wo...@apache.org> wrote:
> Hi there,
>
> Chris over in  https://issues.apache.org/jira/browse/INFRA-16768
> recommended I start a thread.
>
> We've been getting increasing numbers of failures on our builds
> due to nodes running out of disk space. In 16768, Chris says:
>
> "We are getting to the point where builds are running machines out of space faster than we can clear them out. I've cleaned up H24 a bit. We'll discuss further, but this is going to take some cooperation amongst all builders to start purging workspaces."
>
> So this is the requested thread. What do we, collectively, as
> Jenkins users, need to do to clean out workspaces? I'm
> fairly sure that CouchDB workspaces are pretty clean, since we
> do the bulk of our builds in /tmp and try our best to clean up
> after failed builds. But I am happy to admit I don't know
> everything I should or shouldn't be doing.
>
> Chris, would a list of the "top offenders" be useful? I'm not
> looking to shame anyone, but shedding a little light on the
> approaches that are the biggest problem might help.
>
> -Joan



-- 
Zoran Regvart