You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ambari.apache.org by Ganesh Viswanathan <ga...@gmail.com> on 2017/03/24 18:40:27 UTC

Ambari's "HBase Regionserver Process" alert thresholds

I am using Ambari's "HBase Regionserver Process" alert with 1.5s as WARNING
threshold and 3600s as CRITICAL threshold. However, when I test this by
turning down the regionserver process, the alert fires off as CRITICAL
directly. Is this a bug?

I am using HDP2.4 with Ambari 2.2.1.0:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Users_Guide/content/_hbase_service_alerts.html


Thanks,
Ganesh

Re: Ambari's "HBase Regionserver Process" alert thresholds

Posted by Jonathan Hurley <jh...@hortonworks.com>.
You're right that the AGGREGATE alert doesn't give you the host name of the affected host. You can query the alerts endpoint directly to discover the name of the host:
GET api/v1/clusters/<clusterName>/alerts?Alert/state=CRITICAL&Alert/definition_name=hbase_regionserver_process

On Mar 24, 2017, at 4:05 PM, Ganesh Viswanathan <ga...@gmail.com>> wrote:

This API call worked to get the state for all regionservers:

/api/v1/clusters/cluster_name/services/HBASE/components/HBASE_REGIONSERVER?fields=host_components/HostRoles/state

I can filter out INSTALLED from this list to find the stopped one.

Thanks!


On Fri, Mar 24, 2017 at 12:34 PM, Ganesh Viswanathan <ga...@gmail.com>> wrote:
Thanks, that explains the behavior when I shut down the regionserver process and see the CRITICAL alert.

What I am trying to do is setup a WARNING alert for the case when a single "HBase Regionserver Process" is down and CRITICAL alert when two or more  regionservers are down. I am also trying to get the hostname where the regionserver is down in the warning case.

Only the "HBase Regionserver Process" alert gives the name of the host impacted (I don't get these from "RegionServers Health Summary" and "Percent RegionServers Available"), hence I am trying to suitably modify this alert for my use-case. Is there a better way to get the regionserver host impacted from Ambari API when RegionServers Health Summary fires at WARNING level?




On Fri, Mar 24, 2017 at 12:27 PM, Jonathan Hurley <jh...@hortonworks.com>> wrote:
I'm not sure what you mean when you say "turn down" the process. If you are shutting down the process, then the port is released and the alert will not be able to make a socket connection. You will get a CRITICAL right away. The values in the alert are a round-trip-time coupled with a socket read time. For the warning, it will attempt to make a socket connection and if it succeeds and releases in under 1.5 seconds, then there's no warning. Because you set the CRITICAL value to 3600s but stopped the process, it's not going to wait 3600 since it can detect much faster that the port is not open for a socket connection.

On Mar 24, 2017, at 2:40 PM, Ganesh Viswanathan <ga...@gmail.com>> wrote:

I am using Ambari's "HBase Regionserver Process" alert with 1.5s as WARNING threshold and 3600s as CRITICAL threshold. However, when I test this by turning down the regionserver process, the alert fires off as CRITICAL directly. Is this a bug?

I am using HDP2.4 with Ambari 2.2.1.0<http://2.2.1.0/>:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Users_Guide/content/_hbase_service_alerts.html


Thanks,
Ganesh





Re: Ambari's "HBase Regionserver Process" alert thresholds

Posted by Ganesh Viswanathan <ga...@gmail.com>.
This API call worked to get the state for all regionservers:

/api/v1/clusters/cluster_name/services/HBASE/components/
HBASE_REGIONSERVER?fields=host_components/HostRoles/state

I can filter out INSTALLED from this list to find the stopped one.

Thanks!


On Fri, Mar 24, 2017 at 12:34 PM, Ganesh Viswanathan <ga...@gmail.com>
wrote:

> Thanks, that explains the behavior when I shut down the regionserver
> process and see the CRITICAL alert.
>
> What I am trying to do is setup a WARNING alert for the case when a single
> "HBase Regionserver Process" is down and CRITICAL alert when two or more
> regionservers are down. I am also trying to get the hostname where the
> regionserver is down in the warning case.
>
> Only the "HBase Regionserver Process" alert gives the name of the host
> impacted (I don't get these from "RegionServers Health Summary" and
> "Percent RegionServers Available"), hence I am trying to suitably modify
> this alert for my use-case. Is there a better way to get the regionserver
> host impacted from Ambari API when RegionServers Health Summary fires at
> WARNING level?
>
>
>
>
> On Fri, Mar 24, 2017 at 12:27 PM, Jonathan Hurley <jhurley@hortonworks.com
> > wrote:
>
>> I'm not sure what you mean when you say "turn down" the process. If you
>> are shutting down the process, then the port is released and the alert will
>> not be able to make a socket connection. You will get a CRITICAL right
>> away. The values in the alert are a round-trip-time coupled with a socket
>> read time. For the warning, it will attempt to make a socket connection and
>> if it succeeds and releases in under 1.5 seconds, then there's no warning.
>> Because you set the CRITICAL value to 3600s but stopped the process, it's
>> not going to wait 3600 since it can detect much faster that the port is not
>> open for a socket connection.
>>
>> On Mar 24, 2017, at 2:40 PM, Ganesh Viswanathan <ga...@gmail.com> wrote:
>>
>> I am using Ambari's "HBase Regionserver Process" alert with 1.5s as
>> WARNING threshold and 3600s as CRITICAL threshold. However, when I test
>> this by turning down the regionserver process, the alert fires off as
>> CRITICAL directly. Is this a bug?
>>
>> I am using HDP2.4 with Ambari 2.2.1.0:
>> https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_
>> Ambari_Users_Guide/content/_hbase_service_alerts.html
>>
>>
>> Thanks,
>> Ganesh
>>
>>
>>
>

Re: Ambari's "HBase Regionserver Process" alert thresholds

Posted by Ganesh Viswanathan <ga...@gmail.com>.
Thanks, that explains the behavior when I shut down the regionserver
process and see the CRITICAL alert.

What I am trying to do is setup a WARNING alert for the case when a single
"HBase Regionserver Process" is down and CRITICAL alert when two or more
regionservers are down. I am also trying to get the hostname where the
regionserver is down in the warning case.

Only the "HBase Regionserver Process" alert gives the name of the host
impacted (I don't get these from "RegionServers Health Summary" and
"Percent RegionServers Available"), hence I am trying to suitably modify
this alert for my use-case. Is there a better way to get the regionserver
host impacted from Ambari API when RegionServers Health Summary fires at
WARNING level?



On Fri, Mar 24, 2017 at 12:27 PM, Jonathan Hurley <jh...@hortonworks.com>
wrote:

> I'm not sure what you mean when you say "turn down" the process. If you
> are shutting down the process, then the port is released and the alert will
> not be able to make a socket connection. You will get a CRITICAL right
> away. The values in the alert are a round-trip-time coupled with a socket
> read time. For the warning, it will attempt to make a socket connection and
> if it succeeds and releases in under 1.5 seconds, then there's no warning.
> Because you set the CRITICAL value to 3600s but stopped the process, it's
> not going to wait 3600 since it can detect much faster that the port is not
> open for a socket connection.
>
> On Mar 24, 2017, at 2:40 PM, Ganesh Viswanathan <ga...@gmail.com> wrote:
>
> I am using Ambari's "HBase Regionserver Process" alert with 1.5s as
> WARNING threshold and 3600s as CRITICAL threshold. However, when I test
> this by turning down the regionserver process, the alert fires off as
> CRITICAL directly. Is this a bug?
>
> I am using HDP2.4 with Ambari 2.2.1.0:
> https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_
> Ambari_Users_Guide/content/_hbase_service_alerts.html
>
>
> Thanks,
> Ganesh
>
>
>

Re: Ambari's "HBase Regionserver Process" alert thresholds

Posted by Jonathan Hurley <jh...@hortonworks.com>.
I'm not sure what you mean when you say "turn down" the process. If you are shutting down the process, then the port is released and the alert will not be able to make a socket connection. You will get a CRITICAL right away. The values in the alert are a round-trip-time coupled with a socket read time. For the warning, it will attempt to make a socket connection and if it succeeds and releases in under 1.5 seconds, then there's no warning. Because you set the CRITICAL value to 3600s but stopped the process, it's not going to wait 3600 since it can detect much faster that the port is not open for a socket connection.

On Mar 24, 2017, at 2:40 PM, Ganesh Viswanathan <ga...@gmail.com>> wrote:

I am using Ambari's "HBase Regionserver Process" alert with 1.5s as WARNING threshold and 3600s as CRITICAL threshold. However, when I test this by turning down the regionserver process, the alert fires off as CRITICAL directly. Is this a bug?

I am using HDP2.4 with Ambari 2.2.1.0<http://2.2.1.0/>:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Users_Guide/content/_hbase_service_alerts.html


Thanks,
Ganesh