You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2011/07/28 19:59:09 UTC

[jira] [Created] (HBASE-4145) Provide metrics for hbase client

Provide metrics for hbase client
--------------------------------

                 Key: HBASE-4145
                 URL: https://issues.apache.org/jira/browse/HBASE-4145
             Project: HBase
          Issue Type: Improvement
            Reporter: Ming Ma
            Assignee: Ming Ma


Sometimes it is useful to get some metrics from hbase client point of view.

1. scan.next() call latency. Useful for regular map job scenario, where we want to see the latency from client side that includes RPC.
2. region server retry count. Some spike could indicate some issues in network or RS availability.

In the map reduce scenario, hbase client will run the same machines as RSs. So if the MetricsContext is usually available for hbase client. 



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087310#comment-13087310 ] 

stack commented on HBASE-4145:
------------------------------

@Ming FYI, there is an existing mechanism for setting arbitrary attributes on Scan, etc., objects: http://hbase.apache.org/xref/org/apache/hadoop/hbase/client/Scan.html#453

> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117870#comment-13117870 ] 

Ted Yu commented on HBASE-4145:
-------------------------------

I ran test suite.
I got intermittent failure for testMasterFailoverWithMockedRITOnDeadRS but couldn't reproduce in standalone mode.
                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "Ming Ma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087418#comment-13087418 ] 

Ming Ma commented on HBASE-4145:
--------------------------------

Ah, thanks for pointing out this, Stack. We can use this for #3. The ClientScanner will call scan.setAttribute with well-defined metrics property names. TableInputFormat will call scan.getAttribute to access the metrics values and pass onto MapReduce framework as counters.

> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116989#comment-13116989 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/#review2149
-----------------------------------------------------------


Nice work.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment4986>

    Should be declared as implementing VersionedWritable.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment4984>

    It is a bit hard to read this counter.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment4985>

    This is count of regions scanned, right ?
    If so, please name it that way.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment4987>

    mb should be included in the exception.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment4988>

    Why do we need this check again ?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment4989>

    Value of version should be included here.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment4990>

    I think we should have else block where the unsupported mb is logged.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
<https://reviews.apache.org/r/1674/#comment4991>

    This name doesn't really match the constant above. I think "HBase mapreduce Counters" would be better.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
<https://reviews.apache.org/r/1674/#comment4992>

    This should not be a tongue twister.
    How about naming it retrieveGetCounterWithStrings ?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
<https://reviews.apache.org/r/1674/#comment4993>

    Shall we create the Object array outside the for loop and only fill in Metric name here ?


- Ted


On 2011-09-28 23:03:54, Ming Ma wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1674/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-28 23:03:54)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1. Collect client-side scan related metrics during scan operation. It is turned off by default.
bq.  2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
bq.  3. Clean up some minor issues in tableInputFormat as well as test code.
bq.  
bq.  
bq.  This addresses bug hbase-4145.
bq.      https://issues.apache.org/jira/browse/hbase-4145
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
bq.  
bq.  Diff: https://reviews.apache.org/r/1674/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1. Verified on a small cluster.
bq.  2. Existing unit tests.
bq.  3. Added new tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116884#comment-13116884 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/
-----------------------------------------------------------

(Updated 2011-09-28 23:03:54.523337)


Review request for hbase.


Changes
-------

Thanks, Todd.

Rename counter name from COUNT_OF_RPC_CALLS to RPC_CALLS, etc.


Summary
-------

1. Collect client-side scan related metrics during scan operation. It is turned off by default.
2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
3. Clean up some minor issues in tableInputFormat as well as test code.


This addresses bug hbase-4145.
    https://issues.apache.org/jira/browse/hbase-4145


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 

Diff: https://reviews.apache.org/r/1674/diff


Testing
-------

1. Verified on a small cluster.
2. Existing unit tests.
3. Added new tests.


Thanks,

Ming


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4145) Provide metrics for hbase client

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4145:
--------------------------

    Fix Version/s: 0.94.0
    
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4145) Provide metrics for hbase client

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4145:
--------------------------

    Release Note: 
Scan related metrics from hbase client point of view are collected.
These metrics are exposed as MapReduce counters.
    
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117609#comment-13117609 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------



bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java, line 48
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46377#file46377line48>
bq.  >
bq.  >     Should be declared as implementing VersionedWritable.

The issue with VersionedWritable is it throws VersionMismatchException if the version doesn't match. 

  public void readFields(DataInput in) throws IOException {
    byte version = in.readByte();                 // read version
    if (version != getVersion())
      throw new VersionMismatchException(getVersion(), version);
  }

I want to make it backward compatible to support version <= getVersion(). The program could catch VersionMismatchException, however, there is no way to find out the expectedVersion and foundVersion, given they are private members.

public class VersionMismatchException extends IOException {

  private byte expectedVersion;
  private byte foundVersion;
...
}

Any other suggestions, or Is it something that need to be fixed in VersionedWritable, VersionMismatchException?


bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java, line 76
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46377#file46377line76>
bq.  >
bq.  >     It is a bit hard to read this counter.

Fixed.


bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java, line 94
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46377#file46377line94>
bq.  >
bq.  >     This is count of regions scanned, right ?
bq.  >     If so, please name it that way.

Todd suggested to rename it from COUNT_OF_REGIONS to REGIONS, given the fact that it is a counter is implicit in mapreduce framework.


bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java, line 127
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46377#file46377line127>
bq.  >
bq.  >     mb should be included in the exception.

Fixed.


bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java, line 133
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46377#file46377line133>
bq.  >
bq.  >     Why do we need this check again ?

Fixed.


bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java, line 143
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46377#file46377line143>
bq.  >
bq.  >     Value of version should be included here.

Fixed.


bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java, line 151
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46377#file46377line151>
bq.  >
bq.  >     I think we should have else block where the unsupported mb is logged.

Fixed.


bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java, line 52
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46380#file46380line52>
bq.  >
bq.  >     This name doesn't really match the constant above. I think "HBase mapreduce Counters" would be better.

The name should show up in mapreduce UI and report. Other group names don't have "mapreduce". So keep it as "HBase Counters" and rename the variable.


bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java, line 83
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46380#file46380line83>
bq.  >
bq.  >     This should not be a tongue twister.
bq.  >     How about naming it retrieveGetCounterWithStrings ?

Fixed.


bq.  On 2011-09-29 04:18:54, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java, line 232
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46380#file46380line232>
bq.  >
bq.  >     Shall we create the Object array outside the for loop and only fill in Metric name here ?

Fixed. Don't create Object at all, pass the parameters directly.


- Ming


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/#review2149
-----------------------------------------------------------


On 2011-09-29 21:00:18, Ming Ma wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1674/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-29 21:00:18)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1. Collect client-side scan related metrics during scan operation. It is turned off by default.
bq.  2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
bq.  3. Clean up some minor issues in tableInputFormat as well as test code.
bq.  
bq.  
bq.  This addresses bug hbase-4145.
bq.      https://issues.apache.org/jira/browse/hbase-4145
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
bq.  
bq.  Diff: https://reviews.apache.org/r/1674/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1. Verified on a small cluster.
bq.  2. Existing unit tests.
bq.  3. Added new tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117634#comment-13117634 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/#review2178
-----------------------------------------------------------

Ship it!


- Ted


On 2011-09-29 21:36:19, Ming Ma wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1674/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-29 21:36:19)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1. Collect client-side scan related metrics during scan operation. It is turned off by default.
bq.  2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
bq.  3. Clean up some minor issues in tableInputFormat as well as test code.
bq.  
bq.  
bq.  This addresses bug hbase-4145.
bq.      https://issues.apache.org/jira/browse/hbase-4145
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
bq.  
bq.  Diff: https://reviews.apache.org/r/1674/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1. Verified on a small cluster.
bq.  2. Existing unit tests.
bq.  3. Added new tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116985#comment-13116985 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/#review2148
-----------------------------------------------------------



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment4983>

    Should read 'can be easily'


- Ted


On 2011-09-28 23:03:54, Ming Ma wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1674/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-28 23:03:54)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1. Collect client-side scan related metrics during scan operation. It is turned off by default.
bq.  2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
bq.  3. Clean up some minor issues in tableInputFormat as well as test code.
bq.  
bq.  
bq.  This addresses bug hbase-4145.
bq.      https://issues.apache.org/jira/browse/hbase-4145
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
bq.  
bq.  Diff: https://reviews.apache.org/r/1674/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1. Verified on a small cluster.
bq.  2. Existing unit tests.
bq.  3. Added new tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401683#comment-13401683 ] 

Jean-Daniel Cryans commented on HBASE-4145:
-------------------------------------------

I just stumbled upon this code, it seems there's an issue in {{TableRecordReaderImpl}}. Calling restart() does this:

{code}
public void restart(byte[] firstRow) throws IOException {
  currentScan = new Scan(scan);
{code}

Which by itself is fine since the metrics will be copied from *scan* to *currentScan*, except that it's *currentScan* that has the updated metrics not *scan*.

In other words, *currentScan* is the object that is used for scanning so it contains the metrics. If restart() is called, that object is overwritten by the original definition of the {{Scan}}. I think to fix this we could grab the metrics from *currentScan* first then set them back on the new object.
                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4145) Provide metrics for hbase client

Posted by "Ted Yu (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu resolved HBASE-4145.
---------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
    
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4145) Provide metrics for hbase client

Posted by "Ming Ma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ming Ma updated HBASE-4145:
---------------------------

    Description: 
Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.

What to capture, for example, for each ResultScanner object,

1. The number of RPC calls to RSs.
2. The delta time between consecutive RPC calls in the current serialized scan implementation.
3. The number of RPC retry to RSs.
4. The number of NotServingRegionException got.
5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
6. The number of regions accessed.


How to capture

1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.

2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.

3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
a) Have ResultScanner return a new ResultScannerMetrics interface.
b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.







  was:
Sometimes it is useful to get some metrics from hbase client point of view.

1. scan.next() call latency. Useful for regular map job scenario, where we want to see the latency from client side that includes RPC.
2. region server retry count. Some spike could indicate some issues in network or RS availability.

In the map reduce scenario, hbase client will run the same machines as RSs. So if the MetricsContext is usually available for hbase client. 




> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117611#comment-13117611 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------



bq.  On 2011-09-29 03:48:24, Ted Yu wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java, line 43
bq.  > <https://reviews.apache.org/r/1674/diff/3/?file=46268#file46268line43>
bq.  >
bq.  >     Should read 'can be easily'

Fixed.


- Ming


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/#review2148
-----------------------------------------------------------


On 2011-09-29 21:00:18, Ming Ma wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1674/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-29 21:00:18)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1. Collect client-side scan related metrics during scan operation. It is turned off by default.
bq.  2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
bq.  3. Clean up some minor issues in tableInputFormat as well as test code.
bq.  
bq.  
bq.  This addresses bug hbase-4145.
bq.      https://issues.apache.org/jira/browse/hbase-4145
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
bq.  
bq.  Diff: https://reviews.apache.org/r/1674/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1. Verified on a small cluster.
bq.  2. Existing unit tests.
bq.  3. Added new tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116687#comment-13116687 ] 

Todd Lipcon commented on HBASE-4145:
------------------------------------

This is nice stuff. I haven't looked at the code yet, but the feature seems very useful. One small nit from the screenshot - I  think we can rename the counters from "COUNT_OF_FOO" to just "FOOS" -- the fact that it's' a COUNT_OF or SUM_OF is implicit in it being a counter. eg we had HDFS_BYTES_READ, not COUNT_OF_HDFS_BYTES_READ
                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117612#comment-13117612 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/
-----------------------------------------------------------

(Updated 2011-09-29 21:00:18.525989)


Review request for hbase.


Changes
-------

Thanks for the review, Ted, Ram. Most are fixed. Please find comments inline.


Summary
-------

1. Collect client-side scan related metrics during scan operation. It is turned off by default.
2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
3. Clean up some minor issues in tableInputFormat as well as test code.


This addresses bug hbase-4145.
    https://issues.apache.org/jira/browse/hbase-4145


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 

Diff: https://reviews.apache.org/r/1674/diff


Testing
-------

1. Verified on a small cluster.
2. Existing unit tests.
3. Added new tests.


Thanks,

Ming


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116588#comment-13116588 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/
-----------------------------------------------------------

(Updated 2011-09-28 16:35:57.691899)


Review request for hbase.


Changes
-------

Merge with latest trunk.
Run unit tests couple more times.


Summary
-------

1. Collect client-side scan related metrics during scan operation. It is turned off by default.
2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
3. Clean up some minor issues in tableInputFormat as well as test code.


This addresses bug hbase-4145.
    https://issues.apache.org/jira/browse/hbase-4145


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 

Diff: https://reviews.apache.org/r/1674/diff


Testing
-------

1. Verified on a small cluster.
2. Existing unit tests.
3. Added new tests.


Thanks,

Ming


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117608#comment-13117608 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------



bq.  On 2011-09-29 13:33:05, ramkrishna vasudevan wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java, line 2
bq.  > <https://reviews.apache.org/r/1674/diff/4/?file=46377#file46377line2>
bq.  >
bq.  >     I think we need not give copyright information.

Fixed.


- Ming


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/#review2150
-----------------------------------------------------------


On 2011-09-29 21:00:18, Ming Ma wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1674/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-29 21:00:18)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1. Collect client-side scan related metrics during scan operation. It is turned off by default.
bq.  2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
bq.  3. Clean up some minor issues in tableInputFormat as well as test code.
bq.  
bq.  
bq.  This addresses bug hbase-4145.
bq.      https://issues.apache.org/jira/browse/hbase-4145
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
bq.  
bq.  Diff: https://reviews.apache.org/r/1674/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1. Verified on a small cluster.
bq.  2. Existing unit tests.
bq.  3. Added new tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117871#comment-13117871 ] 

Ted Yu commented on HBASE-4145:
-------------------------------

Integrated to TRUNK.

Thanks for the patch, Ming.
                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117621#comment-13117621 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/#review2175
-----------------------------------------------------------


Only one minor comment left, see below.
If test suite passes, I would vote +1


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment5068>

    VersionedWritable from hadoop left something to be desired.
    Such discussion came up during HBASE-2195.
    We can address this elsewhere.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment5069>

    I meant that this counter can be named REGIONS_SCANNED because its value may be lower than the total number of regions in the table(s).


- Ted


On 2011-09-29 21:00:18, Ming Ma wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1674/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-29 21:00:18)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1. Collect client-side scan related metrics during scan operation. It is turned off by default.
bq.  2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
bq.  3. Clean up some minor issues in tableInputFormat as well as test code.
bq.  
bq.  
bq.  This addresses bug hbase-4145.
bq.      https://issues.apache.org/jira/browse/hbase-4145
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
bq.  
bq.  Diff: https://reviews.apache.org/r/1674/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1. Verified on a small cluster.
bq.  2. Existing unit tests.
bq.  3. Added new tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117633#comment-13117633 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/
-----------------------------------------------------------

(Updated 2011-09-29 21:36:19.875315)


Review request for hbase.


Changes
-------

Oh, change the name from REGIONS to REGIONS_SCANNED.
Tested with unit tests as well as real cluster.


Summary
-------

1. Collect client-side scan related metrics during scan operation. It is turned off by default.
2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
3. Clean up some minor issues in tableInputFormat as well as test code.


This addresses bug hbase-4145.
    https://issues.apache.org/jira/browse/hbase-4145


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 

Diff: https://reviews.apache.org/r/1674/diff


Testing
-------

1. Verified on a small cluster.
2. Existing unit tests.
3. Added new tests.


Thanks,

Ming


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117295#comment-13117295 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/#review2150
-----------------------------------------------------------



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
<https://reviews.apache.org/r/1674/#comment4994>

    I think we need not give copyright information.


- ramkrishna


On 2011-09-28 23:03:54, Ming Ma wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1674/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-28 23:03:54)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1. Collect client-side scan related metrics during scan operation. It is turned off by default.
bq.  2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
bq.  3. Clean up some minor issues in tableInputFormat as well as test code.
bq.  
bq.  
bq.  This addresses bug hbase-4145.
bq.      https://issues.apache.org/jira/browse/hbase-4145
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1176942 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1176942 
bq.  
bq.  Diff: https://reviews.apache.org/r/1674/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1. Verified on a small cluster.
bq.  2. Existing unit tests.
bq.  3. Added new tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.


                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118134#comment-13118134 ] 

Hudson commented on HBASE-4145:
-------------------------------

Integrated in HBase-TRUNK #2272 (See [https://builds.apache.org/job/HBase-TRUNK/2272/])
    HBASE-4145 Provide metrics for hbase client, add ScanMetrics.java
HBASE-4145  Provide metrics for hbase client (Ming Ma)

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java

                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092591#comment-13092591 ] 

jiraposter@reviews.apache.org commented on HBASE-4145:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

1. Collect client-side scan related metrics during scan operation. It is turned off by default.
2. TableInputFormat enables metrics collection on scan and pass the data to mapreduce framework. It only works with new mapreduce APIs that allow TableInputFormat to get access to mapreduce Counter.
3. Clean up some minor issues in tableInputFormat as well as test code.


This addresses bug hbase-4145.
    https://issues.apache.org/jira/browse/hbase-4145


Diffs
-----

  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1162612 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 1162612 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1162612 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 1162612 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java PRE-CREATION 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java 1162612 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java 1162612 
  http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java 1162612 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1162612 
  http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan.java 1162612 

Diff: https://reviews.apache.org/r/1674/diff


Testing
-------

1. Verified on a small cluster.
2. Existing unit tests.
3. Added new tests.


Thanks,

Ming



> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4145) Provide metrics for hbase client

Posted by "Ming Ma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ming Ma updated HBASE-4145:
---------------------------

    Attachment: HBaseClientSideMetrics.jpg

Here is the screenshot of what it looks like on jobtracker UI.
                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401734#comment-13401734 ] 

Zhihong Ted Yu commented on HBASE-4145:
---------------------------------------

@J-D:
HBASE-6277 has been created to address your finding.
                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. This will help understand the metrics for scan/TableInputFormat map job scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we assume there is a solution in HBase that HBase client can use to log such kind of metrics, TableInputFormat can pass in mapreduce task ID as application scan ID to HBase client as small addition to existing scan API; and HBase client can log metrics accordingly with such ID. That will allow query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, there is no good way to access the metrics on per map instance; the MapReduce framework only performs sum on the counter values so it is tricky to find the max of certain metrics in all mapper instances. However, it might be good enough for now. With this approach, the metrics value will be available via MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira