You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2012/05/10 00:53:49 UTC

[jira] [Created] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

Todd Lipcon created HBASE-5980:
----------------------------------

             Summary: Scanner responses from RS should include metrics on rows/KVs filtered
                 Key: HBASE-5980
                 URL: https://issues.apache.org/jira/browse/HBASE-5980
             Project: HBase
          Issue Type: Improvement
          Components: client, metrics, regionserver
    Affects Versions: 0.96.0
            Reporter: Todd Lipcon
            Priority: Minor


Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example:
- number of rows filtered by row key alone (filterRowKey())
- number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode

What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272067#comment-13272067 ] 

Lars Hofhansl edited comment on HBASE-5980 at 5/10/12 4:10 AM:
---------------------------------------------------------------

Do you see this as a measure of a how many KVs a scanner had to touch (and then filter)?

Asking, because if the filter uses seek hints it is impossible to tell how many KVs it caused to skip.

                
      was (Author: lhofhansl):
    Do you see this as a measure of a how many KVs a scanner had to touch (and then filter)?

Asking, because if the filter uses seek hints it is impossible to tell how KVs it caused to skip.

                  
> Scanner responses from RS should include metrics on rows/KVs filtered
> ---------------------------------------------------------------------
>
>                 Key: HBASE-5980
>                 URL: https://issues.apache.org/jira/browse/HBASE-5980
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, metrics, regionserver
>    Affects Versions: 0.96.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example:
> - number of rows filtered by row key alone (filterRowKey())
> - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode
> What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272073#comment-13272073 ] 

Todd Lipcon commented on HBASE-5980:
------------------------------------

Yep - some proxy for the efficiency of the filter. Often, newer HBase users apply filters expecting them to work like SQL WHERE clauses, and don't realize that even though their scan returns only 100 rows, it in fact is reading thousands or millions off disk.
                
> Scanner responses from RS should include metrics on rows/KVs filtered
> ---------------------------------------------------------------------
>
>                 Key: HBASE-5980
>                 URL: https://issues.apache.org/jira/browse/HBASE-5980
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, metrics, regionserver
>    Affects Versions: 0.96.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example:
> - number of rows filtered by row key alone (filterRowKey())
> - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode
> What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272067#comment-13272067 ] 

Lars Hofhansl commented on HBASE-5980:
--------------------------------------

Do you see this as a measure of a how many KVs a scanner had to touch (and then filter)?

Asking, because if the filter uses seek hints it is impossible to tell how KVs it caused to skip.

                
> Scanner responses from RS should include metrics on rows/KVs filtered
> ---------------------------------------------------------------------
>
>                 Key: HBASE-5980
>                 URL: https://issues.apache.org/jira/browse/HBASE-5980
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, metrics, regionserver
>    Affects Versions: 0.96.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example:
> - number of rows filtered by row key alone (filterRowKey())
> - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode
> What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5980) Scanner responses from RS should include metrics on rows/KVs filtered

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272227#comment-13272227 ] 

Anoop Sam John commented on HBASE-5980:
---------------------------------------

Nice idea..:) So this metric will give a value some thing like rows(or KVs) included in result / total rows(KVs) the scanner actually touched [% of these]

When use seek using hints from filter or use reseeks using coprocessors this % will be really good [Close to 100%]

@Todd - Would it be nice to get some more metric out which tells ammount of the data, actually fetched from HDFS for this scan[next call]? May be we can tell how many HFile blocks fetched from the HDFS?
                
> Scanner responses from RS should include metrics on rows/KVs filtered
> ---------------------------------------------------------------------
>
>                 Key: HBASE-5980
>                 URL: https://issues.apache.org/jira/browse/HBASE-5980
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, metrics, regionserver
>    Affects Versions: 0.96.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> Currently it's difficult to know, when issuing a filter, what percentage of rows were skipped by that filter. We should expose some basic counters back to the client scanner object. For example:
> - number of rows filtered by row key alone (filterRowKey())
> - number of times each filter response was returned by filterKeyValue() - corresponding to Filter.ReturnCode
> What would be slickest is if this could actually return a tree of counters for cases where FilterList or other combining filters are used. But a top-level is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira