You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2011/09/10 02:42:08 UTC

[jira] [Created] (HBASE-4366) dynamic metrics logging

dynamic metrics logging
-----------------------

                 Key: HBASE-4366
                 URL: https://issues.apache.org/jira/browse/HBASE-4366
             Project: HBase
          Issue Type: New Feature
          Components: metrics
            Reporter: Ming Ma
            Assignee: Ming Ma


First, if there is existing solution for this, I would close this jira. Also I realize we already have various overlapping solutions; creating another solution isn't necessarily the best approach. However, I couldn't find anything that can meet the need. So open this jira for discussion.

We have some scenarios in hbase/mapreduce/hdfs that requires logging large number of dynamic metrics. They can be used for troubleshooting, better measurement on the system and scorecard. For example,
 
1.HBase. Get metrics such as request per sec that are specific to a table, or column family.
2.Mapreduce Job history analysis. Would like to found out all the job ids that are submitted, completed, etc. in a specific time window.

For troubleshooting, what people usually do today, 1) Use current machine-level metrics to find out which machine has the issue. 2) go to that machine, analysis the local log.



The characteristics of such kind of metrics:
 
1.It isn't something that can be predefined. The key such as table name, job id is dynamic.
2.The number of such metrics could be much larger than what the current metrics framework can handle.
3.We don't have a scenario that require near real time query support, e.g., from the time the metrics is generated to the time it is available to query can be at like an hour.
4.How data is consumed is highly application specific.

Some ideas:

1. Provide some interface for any application to log data.
2. The metrics can be written to log files. The log files or log entries will be loaded to HBase, or HDFS asynchronously. That could go to a separate cluster.
3. To consume such data, application could run map reduce job on the log files for aggregation, or do random read directly from HBase.


Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4366) dynamic metrics logging

Posted by "Elliott Clark (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454716#comment-13454716 ] 

Elliott Clark commented on HBASE-4366:
--------------------------------------

Seems like this has been addressed in 0.94+ we now have Per Region Metrics, Per CF metrics, and per block type.

Are there other requirements or has this been completed ?
                
> dynamic metrics logging
> -----------------------
>
>                 Key: HBASE-4366
>                 URL: https://issues.apache.org/jira/browse/HBASE-4366
>             Project: HBase
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>
> First, if there is existing solution for this, I would close this jira. Also I realize we already have various overlapping solutions; creating another solution isn't necessarily the best approach. However, I couldn't find anything that can meet the need. So open this jira for discussion.
> We have some scenarios in hbase/mapreduce/hdfs that requires logging large number of dynamic metrics. They can be used for troubleshooting, better measurement on the system and scorecard. For example,
>  
> 1.HBase. Get metrics such as request per sec that are specific to a table, or column family.
> 2.Mapreduce Job history analysis. Would like to found out all the job ids that are submitted, completed, etc. in a specific time window.
> For troubleshooting, what people usually do today, 1) Use current machine-level metrics to find out which machine has the issue. 2) go to that machine, analysis the local log.
> The characteristics of such kind of metrics:
>  
> 1.It isn't something that can be predefined. The key such as table name, job id is dynamic.
> 2.The number of such metrics could be much larger than what the current metrics framework can handle.
> 3.We don't have a scenario that require near real time query support, e.g., from the time the metrics is generated to the time it is available to query can be at like an hour.
> 4.How data is consumed is highly application specific.
> Some ideas:
> 1. Provide some interface for any application to log data.
> 2. The metrics can be written to log files. The log files or log entries will be loaded to HBase, or HDFS asynchronously. That could go to a separate cluster.
> 3. To consume such data, application could run map reduce job on the log files for aggregation, or do random read directly from HBase.
> Comments?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira