You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2019/07/11 08:00:20 UTC

[GitHub] [hbase] symat commented on a change in pull request #369: HBASE-21606 document meta table load metrics

symat commented on a change in pull request #369: HBASE-21606 document meta table load metrics
URL: https://github.com/apache/hbase/pull/369#discussion_r302410216
 
 

 ##########
 File path: src/main/asciidoc/_chapters/ops_mgt.adoc
 ##########
 @@ -1738,6 +1738,83 @@ hbase.regionserver.authenticationFailures::
 hbase.regionserver.mutationsWithoutWALCount ::
   Count of writes submitted with a flag indicating they should bypass the write ahead log
 
+[[rs_meta_metrics]]
+=== Meta Table Load Metrics
+
+HBase meta table metrics collection feature is available in HBase 1.4+ but it is disabled by default, as it can
+affect the performance of the cluster. When it is enabled, it helps to monitor client access patterns by collecting
+the following statistics:
+
+* number of get, put and delete operations on the `hbase:meta` table
+* number of get, put and delete operations made by the top-N clients
+* number of operations related to each table
+* number of operations related to the top-N regions
+
+When to use the feature::
+  This feature can help to identify hot spots in the meta table by showing the regions or tables where the meta info is
+  modified (e.g. by create, drop, split or move tables) or retrieved most frequently. It can also help to find misbehaving
+  client applications by showing which clients are using the meta table most heavily, which can for example suggest the
+  lack of meta table buffering or the lack of re-using open client connections in the client application.
+
+.Possible side-effects of enabling this feature
+[WARNING]
+====
+Having large number of clients and regions in the cluster can cause the registration and tracking of a large amount of
+metrics, which can increase the memory and CPU footprint of the HBase region server handling the `hbase:meta` table.
+It can also cause the significant increase of the JMX dump size, which can affect the monitoring or log aggregation
+system you use beside HBase. It is recommended to turn on this feature only during debugging.
+====
+
+Where to find the metrics::
+  Each metric attribute name will start with the ‘MetaTable_’ prefix. For all the metrics you will see five different
+  JMX attributes: count, mean rate, 1 minute rate, 5 minute rate and 15 minute rate. You will find these metrics in JMX
+  under the following MBean:
+  `Hadoop -> HBase -> RegionServer -> Coprocessor.Region.CP_org.apache.hadoop.hbase.coprocessor.MetaTableMetrics`
+
+Configuration::
+  To turn on this feature, you have to enable a custom coprocessor by adding the following section to hbase-site.xml.
+  This coprocessor will run on all the HBase RegionServers, but will be active (i.e. consume memory / CPU) only on
+  the region, where the `hbase:meta` table is located. It will produce JMX metrics which can be downloaded from the
+  web UI of the given RegionServer or by a simple REST call.
+
+.Enabling the Meta Table Metrics feature
+[source,xml]
+----
+<property>
+    <name>hbase.coprocessor.region.classes</name>
+    <value>org.apache.hadoop.hbase.coprocessor.MetaTableMetrics</value>
+</property>
+----
+
+.How the top-N metrics are calculated?
+[NOTE]
+====
+The 'top-N' type of metrics will be counted using the lossy count algorithm, which is about to identify elements in a
+data stream whose frequency count exceed a user-given threshold. The frequency computed by this algorithm is not always
+accurate, but has an error threshold that can be specified by the user as a configuration parameter.
+The run time space required by the algorithm is inversely proportional to the specified error threshold, hence larger
+the error parameter, the smaller the footprint and the less accurate are the metrics. (see the following paper:
+link:http://www.vldb.org/conf/2002/S10P03.pdf[Motwani, R; Manku, G.S (2002). "Approximate frequency counts over data streams"])
+
+You can specify the error rate of the algorithm as a floating-point value between 0 and 1 (exclusive), it's default
+value is 0.02. Having the error rate set to `E` and having `N` as the total number of meta table operations, then
+(assuming the random distribution of the activity of low frequency elements) at most `7 / E` meters will be kept and
 
 Review comment:
   I copied that from the original paper, but I think you are right, 'uniform' is more specific than 'random' in case of a distribution. I will change this as well

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services