You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by pi...@Safe-mail.net on 2015/03/07 19:30:03 UTC

unsubscribe

unsubscribe

-------- Original Message --------
From: Jeff Schroeder <je...@computer.org>
Apparently from: user-return-2791-pinktie=safe-mail.net@mesos.apache.org
To: Mesos Users <us...@mesos.apache.org>
Subject: Question on Monitoring a Mesos Cluster
Date: Sat, 7 Mar 2015 12:02:00 -0600
 

> I wrote a python collectd plugin which pulls both master (only if master/elected == 1) and slave stats from the rest api under /metrics/snapshot and /slave(1)/stats.json respectively and throws those into graphite.
> 
> After getting everything working, I built a few dashboards, one of which displays these stats from http://master:5051/metrics/snapshot:
> 
> master/disk_percent
> master/cpus_percent
> master/mem_percent 
>  
> I had assumed that this was something like aggregate cluster utilization, but this seems incorrect in practice. I have a small cluster with ~1T of memory, ~25T of Disks, and ~540 CPU cores. I had a dozen or so small tasks running, and launched 500 tasks with 1G of memory and 1 CPU each.
> 
> Now I'd expect to se the disk/cpu/mem percentage metrics above go up considerably. I did notice that cpus_percent went to around 0.94.
>  
> What is the correct way to measure overall cluster utilization for capacity planning? We can have the NOC watch this and simply add more hardware when the number starts getting low.
> 
> Thanks
>  
> -- 
> Jeff Schroeder
> 
> Don't drink and derive, alcohol and analysis don't mix.
> http://www.digitalprognosis.com
> 
>