You are viewing a plain text version of this content. The canonical link for it is here.
Posted to builds@apache.org by Andrew Bayer <an...@gmail.com> on 2015/08/18 17:16:58 UTC

Build slave capacity on builds.apache.org

Hey all -

So as you may have noticed, we've seen an increase in build utilization on
builds.a.o in the last few months - which is great! The problem is that
we're seeing more demand than we have resources at this point, and that's
only going to increase. We're working on getting more slaves lined up, but
don't yet have a timeframe for that - we'll let you all know when we have
more news.

A.

Re: Build slave capacity on builds.apache.org

Posted by Andrew Bayer <an...@gmail.com>.
Yeah, I don't think we need physical nodes - leasing hosts somewhere would
be perfectly fine. If we did go with physical nodes, I'd guess that five
would probably be fine for the next year, split into multiple
VMs/containers.

I'm working on getting stats/graphs - should hopefully have that ready in a
week or so.

A.

On Thu, Aug 20, 2015 at 11:01 AM, David Nalley <da...@gnsa.us> wrote:

> On Thu, Aug 20, 2015 at 6:32 AM, Gavin McDonald <gm...@apache.org>
> wrote:
> >
> >> On 20 Aug 2015, at 2:18 am, David Nalley <da...@gnsa.us> wrote:
> >>
> >> <snip>
> >
> >> ...how many additional
> >> build nodes/executors will satiate our current demand for capacity?
> >
> > I know this was not aimed at me but I’ll give my opinion.
> >
> > From what I’ve seen of the current growth over the last year; and to
> allow for
> > future expansion over the next 2 years I would look to get another 20
> static
> > slave machines/vms , each running 2 executors.
> >
>
> Just for some perspective keep in mind that we are 4 months into the
> current budget year. The total hardware/cloud budget for all of the
> ASF is 90k.
> 20 additional physical nodes is a 200k capital outlay if we purchase
> hardware, and 50k/yr amortized over 4 years, and likely another rack
> of power and cooling.
> At RAX that's $4321/month and $51,852/year.
> The cheapest option would be going to the group Daniel found in France
> for ELK, while not nearly as flexible as a true cloud system, we could
> run 20 nodes there for $1500 per month, or $18,000 per year.
> And that's an increase on what we are currently spending, not the
> total. We can ask some of our infra sponsors for that, but we need to
> plan that out, and deal with the risk that involves. We are already
> heavily dependent on a single sponsor for the vast majority of our
> physical CI nodes.
>
> That said, we really need to understand the demand first before we
> decide to spend tens or hundreds of thousands of dollars. In many
> ways, I expected Travis to decrease load on Jenkins, but we continue
> to expand. It'd be interesting to see graphs of # of jobs over time,
> how that's increasing, as well as what projects are currently
> consuming build time.
>
> --David
>

Re: Build slave capacity on builds.apache.org

Posted by David Nalley <da...@gnsa.us>.
On Thu, Aug 20, 2015 at 6:32 AM, Gavin McDonald <gm...@apache.org> wrote:
>
>> On 20 Aug 2015, at 2:18 am, David Nalley <da...@gnsa.us> wrote:
>>
>> <snip>
>
>> ...how many additional
>> build nodes/executors will satiate our current demand for capacity?
>
> I know this was not aimed at me but I’ll give my opinion.
>
> From what I’ve seen of the current growth over the last year; and to allow for
> future expansion over the next 2 years I would look to get another 20 static
> slave machines/vms , each running 2 executors.
>

Just for some perspective keep in mind that we are 4 months into the
current budget year. The total hardware/cloud budget for all of the
ASF is 90k.
20 additional physical nodes is a 200k capital outlay if we purchase
hardware, and 50k/yr amortized over 4 years, and likely another rack
of power and cooling.
At RAX that's $4321/month and $51,852/year.
The cheapest option would be going to the group Daniel found in France
for ELK, while not nearly as flexible as a true cloud system, we could
run 20 nodes there for $1500 per month, or $18,000 per year.
And that's an increase on what we are currently spending, not the
total. We can ask some of our infra sponsors for that, but we need to
plan that out, and deal with the risk that involves. We are already
heavily dependent on a single sponsor for the vast majority of our
physical CI nodes.

That said, we really need to understand the demand first before we
decide to spend tens or hundreds of thousands of dollars. In many
ways, I expected Travis to decrease load on Jenkins, but we continue
to expand. It'd be interesting to see graphs of # of jobs over time,
how that's increasing, as well as what projects are currently
consuming build time.

--David

Re: Build slave capacity on builds.apache.org

Posted by Gavin McDonald <gm...@apache.org>.
> On 20 Aug 2015, at 2:18 am, David Nalley <da...@gnsa.us> wrote:
> 
> <snip>

> ...how many additional
> build nodes/executors will satiate our current demand for capacity?

I know this was not aimed at me but I’ll give my opinion.

From what I’ve seen of the current growth over the last year; and to allow for
future expansion over the next 2 years I would look to get another 20 static
slave machines/vms , each running 2 executors.

Gav…

> 
> 
> --David
> 
> 
> On Tue, Aug 18, 2015 at 11:16 AM, Andrew Bayer <an...@gmail.com> wrote:
>> Hey all -
>> 
>> So as you may have noticed, we've seen an increase in build utilization on
>> builds.a.o in the last few months - which is great! The problem is that
>> we're seeing more demand than we have resources at this point, and that's
>> only going to increase. We're working on getting more slaves lined up, but
>> don't yet have a timeframe for that - we'll let you all know when we have
>> more news.
>> 
>> A.



Re: Build slave capacity on builds.apache.org

Posted by Gavin McDonald <ga...@16degrees.com.au>.
> On 20 Aug 2015, at 3:28 am, David Nalley <da...@gnsa.us> wrote:
> 
> So, just spot checking some of the dynamic build slaves - historically
> those have spun up a few hours at a time, to give us additional
> capacity when our queue spiked - but it looks like the slaves are
> staying online pretty much all of the time. Right now we are
> configured to have 12 slaves. Assuming they are staying online all the
> time, that's roughly $2600 per month that we are spending on build
> slave capacity - I looked at our last bill ending July 24th, and we
> spent about $1200 on build slave capacity - so this is about 2x what
> we were spending - I am curious why the sudden uptick in demand -
> what's changed?

We have seen more projects starting to use jenkins and also existing projects 
are increasing their tests.

The demand for the dynamic slaves seems to be the availability of the static slaves 
are getting less and less so dynamic slaves are being spun up much more often.

So no projects that I can see are targeting the dynamic slaves specifically.

Gav…

> 
> --David
> 
> On Wed, Aug 19, 2015 at 9:18 PM, David Nalley <da...@gnsa.us> wrote:
>> Andrew:
>> 
>> I know Jenkins tracks how big the queue is, is that something that we
>> can graph over time? It'd be interesting to know how that's changed
>> and will change, otherwise we'll constantly be fighting fires here.
>> 
>> I'd like to have the following items graphed:
>> 
>> # of dynamic slaves in operation
>> # of executors currently in use
>> # of jobs in the queue
>> # of jobs in the queue actually waiting on an executor.
>> 
>> It'd be nice to know the average time waiting on an executor as well.
>> 
>> 
>> I don't suppose Jenkins has any functionality that would allow us to
>> run a quota per-project is there?
>> 
>> Since you're one of the resident Jenkins experts how many additional
>> build nodes/executors will satiate our current demand for capacity?
>> 
>> 
>> --David
>> 
>> 
>> On Tue, Aug 18, 2015 at 11:16 AM, Andrew Bayer <an...@gmail.com> wrote:
>>> Hey all -
>>> 
>>> So as you may have noticed, we've seen an increase in build utilization on
>>> builds.a.o in the last few months - which is great! The problem is that
>>> we're seeing more demand than we have resources at this point, and that's
>>> only going to increase. We're working on getting more slaves lined up, but
>>> don't yet have a timeframe for that - we'll let you all know when we have
>>> more news.
>>> 
>>> A.

Gav...

          (    (      (                                                                          
   (      )\ ) )\ )   )\ )       (                       )                    )                  
   )\    (()/((()/(  (()/(       )\ )  (       )      ( /( (      (        ( /(   (   (      (   
((((_)(   /(_))/(_))  /(_)) (   (()/(  )(   ( /(  (   )\()))(    ))\   (   )\()) ))\  )(    ))\  
 )\ _ )\ (_)) (_))_| (_))   )\ ) /(_))(()\  )(_)) )\ (_))/(()\  /((_)  )\ (_))/ /((_)(()\  /((_) 
 (_)_\(_)/ __|| |_   |_ _| _(_/((_) _| ((_)((_)_ ((_)| |_  ((_)(_))(  ((_)| |_ (_))(  ((_)(_))   
  / _ \  \__ \| __|   | | | ' \))|  _|| '_|/ _` |(_-<|  _|| '_|| || |/ _| |  _|| || || '_|/ -_)  
 /_/ \_\ |___/|_|    |___||_||_| |_|  |_|  \__,_|/__/ \__||_|   \_,_|\__|  \__| \_,_||_|  \___|  
                                                                                                 





Re: Build slave capacity on builds.apache.org

Posted by Gavin McDonald <ga...@16degrees.com.au>.
> On 20 Aug 2015, at 7:55 am, Daan Hoogland <da...@gmail.com> wrote:
> 
> cloudstack...! we started making more intensive use of pull-builders. the
> old pull-request build job is replaced by a rat and an analysis job. Maybe
> others have done so as well but I am sure we (ACS) are a culprit in this.

From what I can tell the Cloustack builds are not targeting the dynamic slaves 
on purpose but are using the generic ‘ubuntu’ label.

So what is happening is that the ubuntu-* machines are more or less in constant 
use so the dynamic slaves are getting triggered more and more.

The dynamic slaves have the ‘ubuntu’ label also so this is why.

To reduce the use of the dynamic slaves and therefore the costs involved in 
having these on more and more is to increase our pool of static slaves.

Gav…

> 
> On Thu, Aug 20, 2015 at 4:28 AM, David Nalley <da...@gnsa.us> wrote:
> 
>> So, just spot checking some of the dynamic build slaves - historically
>> those have spun up a few hours at a time, to give us additional
>> capacity when our queue spiked - but it looks like the slaves are
>> staying online pretty much all of the time. Right now we are
>> configured to have 12 slaves. Assuming they are staying online all the
>> time, that's roughly $2600 per month that we are spending on build
>> slave capacity - I looked at our last bill ending July 24th, and we
>> spent about $1200 on build slave capacity - so this is about 2x what
>> we were spending - I am curious why the sudden uptick in demand -
>> what's changed?
>> 
>> --David
>> 
>> On Wed, Aug 19, 2015 at 9:18 PM, David Nalley <da...@gnsa.us> wrote:
>>> Andrew:
>>> 
>>> I know Jenkins tracks how big the queue is, is that something that we
>>> can graph over time? It'd be interesting to know how that's changed
>>> and will change, otherwise we'll constantly be fighting fires here.
>>> 
>>> I'd like to have the following items graphed:
>>> 
>>> # of dynamic slaves in operation
>>> # of executors currently in use
>>> # of jobs in the queue
>>> # of jobs in the queue actually waiting on an executor.
>>> 
>>> It'd be nice to know the average time waiting on an executor as well.
>>> 
>>> 
>>> I don't suppose Jenkins has any functionality that would allow us to
>>> run a quota per-project is there?
>>> 
>>> Since you're one of the resident Jenkins experts how many additional
>>> build nodes/executors will satiate our current demand for capacity?
>>> 
>>> 
>>> --David
>>> 
>>> 
>>> On Tue, Aug 18, 2015 at 11:16 AM, Andrew Bayer <an...@gmail.com>
>> wrote:
>>>> Hey all -
>>>> 
>>>> So as you may have noticed, we've seen an increase in build utilization
>> on
>>>> builds.a.o in the last few months - which is great! The problem is that
>>>> we're seeing more demand than we have resources at this point, and
>> that's
>>>> only going to increase. We're working on getting more slaves lined up,
>> but
>>>> don't yet have a timeframe for that - we'll let you all know when we
>> have
>>>> more news.
>>>> 
>>>> A.
>> 
> 
> 
> 
> -- 
> Daan

Gav...

          (    (      (                                                                          
   (      )\ ) )\ )   )\ )       (                       )                    )                  
   )\    (()/((()/(  (()/(       )\ )  (       )      ( /( (      (        ( /(   (   (      (   
((((_)(   /(_))/(_))  /(_)) (   (()/(  )(   ( /(  (   )\()))(    ))\   (   )\()) ))\  )(    ))\  
 )\ _ )\ (_)) (_))_| (_))   )\ ) /(_))(()\  )(_)) )\ (_))/(()\  /((_)  )\ (_))/ /((_)(()\  /((_) 
 (_)_\(_)/ __|| |_   |_ _| _(_/((_) _| ((_)((_)_ ((_)| |_  ((_)(_))(  ((_)| |_ (_))(  ((_)(_))   
  / _ \  \__ \| __|   | | | ' \))|  _|| '_|/ _` |(_-<|  _|| '_|| || |/ _| |  _|| || || '_|/ -_)  
 /_/ \_\ |___/|_|    |___||_||_| |_|  |_|  \__,_|/__/ \__||_|   \_,_|\__|  \__| \_,_||_|  \___|  
                                                                                                 





Re: Build slave capacity on builds.apache.org

Posted by Andrew Bayer <an...@gmail.com>.
So it looks like part of the problem is that HBase builds have been
hanging/causing slaves to barf out due to orphaned java processes eating up
resources. They're looking into this over at
https://issues.apache.org/jira/browse/INFRA-10150.

On Thu, Aug 20, 2015 at 11:44 AM, Daan Hoogland <da...@gmail.com>
wrote:

>
>
> On Thu, Aug 20, 2015 at 5:02 PM, David Nalley <da...@gnsa.us> wrote:
>
>> On Thu, Aug 20, 2015 at 2:55 AM, Daan Hoogland <da...@gmail.com>
>> wrote:
>> > cloudstack...!
>>
>> I thought all of the ACS PR builds were happening on Travis?
>>
> ​only smoke tests, no code analysis​
> ​.​
>
>
>> --David
>>
>
>
>
> --
> Daan
>

Re: Build slave capacity on builds.apache.org

Posted by Daan Hoogland <da...@gmail.com>.
On Thu, Aug 20, 2015 at 5:02 PM, David Nalley <da...@gnsa.us> wrote:

> On Thu, Aug 20, 2015 at 2:55 AM, Daan Hoogland <da...@gmail.com>
> wrote:
> > cloudstack...!
>
> I thought all of the ACS PR builds were happening on Travis?
>
​only smoke tests, no code analysis​
​.​


> --David
>



-- 
Daan

Re: Build slave capacity on builds.apache.org

Posted by David Nalley <da...@gnsa.us>.
On Thu, Aug 20, 2015 at 2:55 AM, Daan Hoogland <da...@gmail.com> wrote:
> cloudstack...! we started making more intensive use of pull-builders. the
> old pull-request build job is replaced by a rat and an analysis job. Maybe
> others have done so as well but I am sure we (ACS) are a culprit in this.
>


I thought all of the ACS PR builds were happening on Travis?

--David

Re: Build slave capacity on builds.apache.org

Posted by Daan Hoogland <da...@gmail.com>.
cloudstack...! we started making more intensive use of pull-builders. the
old pull-request build job is replaced by a rat and an analysis job. Maybe
others have done so as well but I am sure we (ACS) are a culprit in this.

On Thu, Aug 20, 2015 at 4:28 AM, David Nalley <da...@gnsa.us> wrote:

> So, just spot checking some of the dynamic build slaves - historically
> those have spun up a few hours at a time, to give us additional
> capacity when our queue spiked - but it looks like the slaves are
> staying online pretty much all of the time. Right now we are
> configured to have 12 slaves. Assuming they are staying online all the
> time, that's roughly $2600 per month that we are spending on build
> slave capacity - I looked at our last bill ending July 24th, and we
> spent about $1200 on build slave capacity - so this is about 2x what
> we were spending - I am curious why the sudden uptick in demand -
> what's changed?
>
> --David
>
> On Wed, Aug 19, 2015 at 9:18 PM, David Nalley <da...@gnsa.us> wrote:
> > Andrew:
> >
> > I know Jenkins tracks how big the queue is, is that something that we
> > can graph over time? It'd be interesting to know how that's changed
> > and will change, otherwise we'll constantly be fighting fires here.
> >
> > I'd like to have the following items graphed:
> >
> > # of dynamic slaves in operation
> > # of executors currently in use
> > # of jobs in the queue
> > # of jobs in the queue actually waiting on an executor.
> >
> > It'd be nice to know the average time waiting on an executor as well.
> >
> >
> > I don't suppose Jenkins has any functionality that would allow us to
> > run a quota per-project is there?
> >
> > Since you're one of the resident Jenkins experts how many additional
> > build nodes/executors will satiate our current demand for capacity?
> >
> >
> > --David
> >
> >
> > On Tue, Aug 18, 2015 at 11:16 AM, Andrew Bayer <an...@gmail.com>
> wrote:
> >> Hey all -
> >>
> >> So as you may have noticed, we've seen an increase in build utilization
> on
> >> builds.a.o in the last few months - which is great! The problem is that
> >> we're seeing more demand than we have resources at this point, and
> that's
> >> only going to increase. We're working on getting more slaves lined up,
> but
> >> don't yet have a timeframe for that - we'll let you all know when we
> have
> >> more news.
> >>
> >> A.
>



-- 
Daan

Re: Build slave capacity on builds.apache.org

Posted by David Nalley <da...@gnsa.us>.
So, just spot checking some of the dynamic build slaves - historically
those have spun up a few hours at a time, to give us additional
capacity when our queue spiked - but it looks like the slaves are
staying online pretty much all of the time. Right now we are
configured to have 12 slaves. Assuming they are staying online all the
time, that's roughly $2600 per month that we are spending on build
slave capacity - I looked at our last bill ending July 24th, and we
spent about $1200 on build slave capacity - so this is about 2x what
we were spending - I am curious why the sudden uptick in demand -
what's changed?

--David

On Wed, Aug 19, 2015 at 9:18 PM, David Nalley <da...@gnsa.us> wrote:
> Andrew:
>
> I know Jenkins tracks how big the queue is, is that something that we
> can graph over time? It'd be interesting to know how that's changed
> and will change, otherwise we'll constantly be fighting fires here.
>
> I'd like to have the following items graphed:
>
> # of dynamic slaves in operation
> # of executors currently in use
> # of jobs in the queue
> # of jobs in the queue actually waiting on an executor.
>
> It'd be nice to know the average time waiting on an executor as well.
>
>
> I don't suppose Jenkins has any functionality that would allow us to
> run a quota per-project is there?
>
> Since you're one of the resident Jenkins experts how many additional
> build nodes/executors will satiate our current demand for capacity?
>
>
> --David
>
>
> On Tue, Aug 18, 2015 at 11:16 AM, Andrew Bayer <an...@gmail.com> wrote:
>> Hey all -
>>
>> So as you may have noticed, we've seen an increase in build utilization on
>> builds.a.o in the last few months - which is great! The problem is that
>> we're seeing more demand than we have resources at this point, and that's
>> only going to increase. We're working on getting more slaves lined up, but
>> don't yet have a timeframe for that - we'll let you all know when we have
>> more news.
>>
>> A.

Re: Build slave capacity on builds.apache.org

Posted by David Nalley <da...@gnsa.us>.
Andrew:

I know Jenkins tracks how big the queue is, is that something that we
can graph over time? It'd be interesting to know how that's changed
and will change, otherwise we'll constantly be fighting fires here.

I'd like to have the following items graphed:

# of dynamic slaves in operation
# of executors currently in use
# of jobs in the queue
# of jobs in the queue actually waiting on an executor.

It'd be nice to know the average time waiting on an executor as well.


I don't suppose Jenkins has any functionality that would allow us to
run a quota per-project is there?

Since you're one of the resident Jenkins experts how many additional
build nodes/executors will satiate our current demand for capacity?


--David


On Tue, Aug 18, 2015 at 11:16 AM, Andrew Bayer <an...@gmail.com> wrote:
> Hey all -
>
> So as you may have noticed, we've seen an increase in build utilization on
> builds.a.o in the last few months - which is great! The problem is that
> we're seeing more demand than we have resources at this point, and that's
> only going to increase. We're working on getting more slaves lined up, but
> don't yet have a timeframe for that - we'll let you all know when we have
> more news.
>
> A.