You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Siddharth Wagle (JIRA)" <ji...@apache.org> on 2014/09/17 19:01:33 UTC
[jira] [Comment Edited] (AMBARI-5707) Replace Ganglia with high performant and pluggable Metrics System

    [ https://issues.apache.org/jira/browse/AMBARI-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136319#comment-14136319 ] 

Siddharth Wagle edited comment on AMBARI-5707 at 9/17/14 5:01 PM:
------------------------------------------------------------------

*Revised architecture overview*:

*Problems with current system*:
- Ganglia has limited capabilities for analyzing historic data, new plugins are not easy to write.
- Horizontal scale out for large clusters.
- No support for adhoc queries.
- Not easy to add metrics support for new services added to the stack.
- It is non trivial to hook up existing time series databases like OpenTSDB to store raw data forever.

*Solution*:
- Replace Ganglia with bespoke solution based on an embedded HBase to fit all needs.
- Ability to store fine-grained data for a configurable amount of time.
- Ability to write SQL (via Phoenix) like queries on aggregated metric data sets and visualize the results.
- Provide pluggable storage API with ability to forward metric data to external long-term storage.
- Ability to add user defined metrics and visualize them through the Ambari Views.

*Component description*:

- *Host metrics monitor*:
A lightweight python process running on every managed host and collecting metrics for the managed processes running on the host in addition to aggregate metrics for the entire host. The collected metrics will be pushed to a pre-configured metric collector to be stored for consumption by the Ambari API.

- *Hadoop Metrics Sink*:
Implementation of Hadoop Metrics Sink interface to pushed data to a configured collector. As a part of the Hadoop Metric Sink implementation, allow a periodic flush of collected metrics data, the _putMetric()_ should write data into a Bounded Buffer cache with a fixed size, configurable through the hadoop-metrics2.properties.

- *Timeline Metrics Collector*:
Metrics collector is daemon that receives data from registered publishers and provides ability to push the metrics data to an external metric storage like OpenTSDB or HDFS along with pushing data to a local metrics store. Additionally, the metrics collector provides ability to plugin aggregators for the collected metric data. The aggregation is performed post-write by aggregator threads running with a configured time interval and aggregating data collected within that interval.

- *Timeline Metrics Store*:
A time series database is ideal for storing metrics data. The main advantage is variable time buckets, for example a row key indicating a metric id followed by an arbitrary number of key value pairs that fit into the time range identified by a part of the key. This storage model allows simple time based aggregation and avoids sparse rows. The deployment modes of HBASE allow for scaling up and down based on cluster size. Also, the choice of HBASE as default storage allows storage to scale independently and seamlessly from the Metric Collectors. Phoenix's SQL - Phoenix provides JDBC APIs instead of the regular HBase client APIs to create tables, insert data, and query your HBase data.

- *Ambari Metrics Service*:
The API design for the Metrics Service should support GET API using key and time range similar what exists on the HBASE cluster.

- *Ambari Views*:
Ambari Views on top of Phoenix provide ad-hoc query capability to the user along with a View to replace Ganglia Web



> Replace Ganglia with high performant and pluggable Metrics System
> -----------------------------------------------------------------
>
>                 Key: AMBARI-5707
>                 URL: https://issues.apache.org/jira/browse/AMBARI-5707
>             Project: Ambari
>          Issue Type: Epic
>          Components: ambari-agent, ambari-server
>    Affects Versions: 1.6.0
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>            Priority: Critical
>         Attachments: MetricsArchLatest.png, Revised archtecture diagram.png
>
>
> *Ambari Metrics System*
> - Ability to collect metrics from Hadoop and other Stack services
> - Ability to retain metrics at a high precision for a configurable time period (say 5 days)
> - Ability to automatically purge metrics after retention period
> - At collection time, provide clear integration point for external system (such as TSDB)
> - At purge time, provide clear integration point for metrics retention by external system
> - Should provide default options for external metrics retention (say “HDFS”)
> - Provide tools / utilities for analyzing metrics in retention system (say “Hive schema, Pig scripts, etc” that can be used with the default retention store “HDFS”)
> *System Requirements*
> - Must be portable and platform independent
> - Must not conflict with any existing metrics system (such as Ganglia)
> - Must not conflict with existing SNMP infra
> - Must not run as root
> - Must have HA story (no SPOF)
> *Usage*
> - Ability to obtain metrics from Ambari REST API (point in time and temporal)
> - Ability to view metric graphs in Ambari Web (currently, fixed)
> - Ability to configure custom metric graphs in Ambari Web (currently, we have metric graphs “fixed” into the UI)
> - Need to improve metric graph “navigation” in Ambari Web (currently, metric graphs do not allow navigation at arbitrary timeframes, but only at ganglia aggregation intervals) 
> - Ability to “view cluster” at point in time (i.e. see all metrics at that point)
> - Ability to define metrics (and how + where to obtain) in Stack Definitions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)