You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jesse Yates (JIRA)" <ji...@apache.org> on 2013/03/01 00:45:14 UTC

[jira] [Updated] (HBASE-7958) Statistics per-column family per-region

     [ https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesse Yates updated HBASE-7958:
-------------------------------

    Attachment: hbase-7958_rough-cut-v0.patch

Attaching patch with what I was thinking for the underlying framework around stats. This is the built-in (non-CP), system stat table implementation.

Things it doesn't have (but should before commit):
 - real system table semantics
 - a clean way to add statistic tracker
 - all the coprocessor hooks.

This implementation is just for major compactions since its a bit harder to deal with general statistics on a minor compaction basis since you don't see all the keys. However, you could easily see adding different hooks based on different statistics around the codebase.

I'd recommend looking at the package-info for a general overview and then jumping down into DefaultCompactor to see how we hook it up. From there StatisticsTable would (to me) be a very natural next step and finally MinMaxStatisticTracker gives you a _very simple_ example of how you would write a statistic.

Mostly, looking for overall design feedback rather than nits. Its also on RB here: https://reviews.apache.org/r/9686/
                
> Statistics per-column family per-region
> ---------------------------------------
>
>                 Key: HBASE-7958
>                 URL: https://issues.apache.org/jira/browse/HBASE-7958
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: hbase-7958_rough-cut-v0.patch
>
>
> Originating from this discussion on the dev list: http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain
> Essentially, we should have built-in statistics gathering for HBase tables. This allows clients to have a better understanding of the distribution of keys within a table and a given region. We could also surface this information via the UI.
> There are a couple different proposals from the email, the overview is this:
> We add in something on compactions that gathers stats about the keys that are written and then we surface them to a table.
> The possible proposals include:
> *How to implement it?*
> # Coprocessors - 
> ** advantage - it easily plugs in and people could pretty easily add their own statistics. 
> ** disadvantage - UI elements would also require this, we get into dependent loading, which leads down the OSGi path. Also, these CPs need to be installed _after_ all the other CPs on compaction to ensure they see exactly what gets written (doable, but a pain)
> # Built into HBase as a custom scanner
> ** advantage - always goes in the right place and no need to muck about with loading CPs etc.
> ** disadvantage - less pluggable, at least for the initial cut
> *Where do we store data?*
> # .META.
> ** advantage - its an existing table, so we can jam it into another CF there
> ** disadvantage - this would make META much larger, possibly leading to splits AND will make it much harder for other processes to read the info
> # A new stats table
> ** advantage - cleanly separates out the information from META
> ** disadvantage - should use a 'system table' idea to prevent accidental deletion, manipulation by arbitrary clients, but still allow clients to read it.
> Once we have this framework, we can then move to an actual implementation of various statistics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira