You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Lawlor (JIRA)" <ji...@apache.org> on 2015/04/16 23:21:59 UTC

[jira] [Updated] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

     [ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Lawlor updated HBASE-5980:
-----------------------------------
    Attachment: HBASE-5980-v1.patch

Attaching a patch that exposes the following server side metrics to the client side ScanMetrics:
* Number of rows scanned (metric name is "ROWS_SCANNED")
* Number of rows filtered (metric name is "ROWS_FILTERED")

Important notes:
* ScanMetrics now contains a mix of both client side and server side metrics
* AbstractScanMetrics and ServerSideScanMetrics were added to try and keep the ScanMetrics stuff clean
* The following new arguments are now supported in scans from the shell:
** GET_ALL_METRICS: boolean indicating whether or not all scan metrics should be output
** GET_METRICS: array of metric keys the user wants to see (this argument trumps GET_ALL_METRICS)
** Example usages:
*** scan 'table', {GET_ALL_METRICS => true}
*** scan 'table', {GET_METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']}
* Metrics are always output in alphabetical order

Discussion points:
* I think the name of the metrics and shell arguments could be improved, just chose some easy names to show their usage. Thoughts?
* The other metric mentioned above still needs to be added (number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode). Adding new metrics is easy: just specify the new field in ServerSideScanMetrics and add the appropriate tracking calls. I wanted to get some feedback on how these metrics looked first rather than add a bunch of metrics all at once.
* All of the metrics [~lhofhansl] mentioned sound great. In terms of coprocessors, what kind of metrics would be valuable to expose to the client?


> Scanner responses from RS should include metrics on rows/KVs filtered
> ---------------------------------------------------------------------
>
>                 Key: HBASE-5980
>                 URL: https://issues.apache.org/jira/browse/HBASE-5980
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, metrics, regionserver
>    Affects Versions: 0.95.2
>            Reporter: Todd Lipcon
>            Priority: Minor
>         Attachments: HBASE-5980-v1.patch
>
>
> Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example:
> - number of rows filtered by row key alone (filterRowKey())
> - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode
> What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)