You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by Kanak Biscuitwala <ka...@hotmail.com> on 2013/10/26 02:35:26 UTC

Helix Health Reporting

Hi,

We're looking at improving health reporting metrics for Helix so that we can hopefully detect failures faster, while keeping reporting lightweight and scalable.

Here's a summary of an exploration I did regarding integrating Helix with Riemann (http://riemann.io), which is a popular monitoring system. It's not done yet, but I'd like to get any feedback and ideas you have.

https://cwiki.apache.org/confluence/display/HELIX/Health+Metrics+Reporting

Thanks,
Kanak 		 	   		  

RE: Helix Health Reporting

Posted by Kanak Biscuitwala <ka...@hotmail.com>.
Riemann configuration is code, so I would probably have to write some Clojure to report to Helix and/or ZK if some rule is violated.

Riemann client is pretty lightweight; it would add riemann-client itself, Google protobuf, netty, and Yammer Metrics. Riemann server is really meant to be deployed as a standalone system, so it has a pretty ridiculous number of dependencies. This is something that might make sense to bundle as an optional dependency or something and launch it only if it exists. Or we could create a stripped-down version of Riemann ourselves.

Either way, we could further break helix-core into a few packages: helix-common, helix-participant, helix-controller, helix-tools, etc. We could even introduce a hierarchy (like helix-participant-with-reporting or something could depend on helix-participant and riemann-client, etc).

I need a little more time to play with this and see what is possible in terms of reducing dependencies.

Kanak

----------------------------------------
> Date: Tue, 29 Oct 2013 23:26:43 -0700
> Subject: Re: Helix Health Reporting
> From: g.kishore@gmail.com
> To: dev@helix.incubator.apache.org
>
> can we get some more info on Riemann. When we set a rule/threshold can we
> get a callback?
>
>
> Do we have an idea what dependencies this will add ?
>
> Can this be made optional, we need to keep the dependencies minimal for
> usecases that dont need this functionality
>
>
>
>
> On Fri, Oct 25, 2013 at 5:35 PM, Kanak Biscuitwala <ka...@hotmail.com>wrote:
>
>> Hi,
>>
>> We're looking at improving health reporting metrics for Helix so that we
>> can hopefully detect failures faster, while keeping reporting lightweight
>> and scalable.
>>
>> Here's a summary of an exploration I did regarding integrating Helix with
>> Riemann (http://riemann.io), which is a popular monitoring system. It's
>> not done yet, but I'd like to get any feedback and ideas you have.
>>
>> https://cwiki.apache.org/confluence/display/HELIX/Health+Metrics+Reporting
>>
>> Thanks,
>> Kanak 		 	   		  

Re: Helix Health Reporting

Posted by kishore g <g....@gmail.com>.
can we get some more info on Riemann. When we set a rule/threshold can we
get a callback?


Do we have an idea what dependencies this will add ?

Can this be made optional, we need to keep the dependencies minimal for
usecases that dont need this functionality




On Fri, Oct 25, 2013 at 5:35 PM, Kanak Biscuitwala <ka...@hotmail.com>wrote:

> Hi,
>
> We're looking at improving health reporting metrics for Helix so that we
> can hopefully detect failures faster, while keeping reporting lightweight
> and scalable.
>
> Here's a summary of an exploration I did regarding integrating Helix with
> Riemann (http://riemann.io), which is a popular monitoring system. It's
> not done yet, but I'd like to get any feedback and ideas you have.
>
> https://cwiki.apache.org/confluence/display/HELIX/Health+Metrics+Reporting
>
> Thanks,
> Kanak