You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2020/08/12 19:28:00 UTC

[jira] [Commented] (HBASE-24527) Improve region housekeeping status observability

    [ https://issues.apache.org/jira/browse/HBASE-24527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176551#comment-17176551 ] 

Viraj Jasani commented on HBASE-24527:
--------------------------------------

{quote}at regionserver scope:
 * listing the current state of a regionserver's compaction, split, and merge tasks and threads
 * counting (simple view) and listing (detailed view) a regionserver's compaction queues
 * listing a region's currently compacting, splitting, or merging status

at master scope, aggregations of the above detailed information into:
 * listing the active compaction tasks and threads for a given table, the extension of _compaction_state_ with a new detailed view
 * listing the active split or merge tasks and threads for a given table's regions{quote}
Among the scopes listed here, from operator's viewpoint, master scope seems more relevant because usually we would want to know what is going on with regions of the table we are interested in. 

For regionserver scope, if we store all region tasks and thread info at regionserver, perhaps we should not allow client to query all RS and aggregate results because each RS might have accommodated many region tasks related info, only one RS should be queried for detailed view of a region at a time.

Master scope can provide table -> regions (with RS and current state) mapping, and operator can query specific RS for detailed view of a region. On the other hand, querying all RS with filtered table/regions might require too many RPC calls from client (which, operator is more likely to keep repeating until all regions come to intended states). Hence, basically both of above scopes, when used together, might provide better results (with likely optimal performance).

Thought?

> Improve region housekeeping status observability
> ------------------------------------------------
>
>                 Key: HBASE-24527
>                 URL: https://issues.apache.org/jira/browse/HBASE-24527
>             Project: HBase
>          Issue Type: New Feature
>          Components: Admin, Compaction, Operability, shell, UI
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>
> We provide a coarse grained admin API and associated shell command for determining the compaction status of a table:
> {noformat}
> hbase(main):001:0> help "compaction_state"
> Here is some help for this command:
>      Gets compaction status (MAJOR, MAJOR_AND_MINOR, MINOR, NONE) for a table:
>      hbase> compaction_state 'ns1:t1'
>      hbase> compaction_state 't1'
> {noformat}
> We also log  compaction activity, including a compaction journal at completion, via log4j to whatever log aggregation solution is available in production.  
> This is not sufficient for online and interactive observation, debugging, or performance analysis of current compaction activity. In this kind of activity an operator is attempting to observe and analyze compaction activity in real time. Log aggregation and presentation solutions have typical latencies (end to end visibility of log lines on the order of ~minutes) which make that not possible today.
> We don't offer any API or tools for directly interrogating split and merge activity in real time. Some indirect knowledge of split or merge activity can be inferred from RIT information via ClusterStatus. It can also be scraped, with some difficulty, from the debug servlet. 
> We should have new APIs and shell commands, and perhaps also new admin UI views, for
> at regionserver scope:
> * listing the current state of a regionserver's compaction, split, and merge tasks and threads
> * counting (simple view) and listing (detailed view) a regionserver's compaction queues
> * listing a region's currently compacting, splitting, or merging status
> at master scope, aggregations of the above detailed information into:
> * listing the active compaction tasks and threads for a given table, the extension of _compaction_state_ with a new detailed view
> * listing the active split or merge tasks and threads for a given table's regions
> Compaction detail should include the names of the effective engine and policy classes, and the results and timestamp of the last compaction selection evaluation. Split and merge detail should include the names of the effective policy classes and the result of the last split or merge criteria evaluation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)