You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2014/02/26 17:41:25 UTC

[jira] [Commented] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

    [ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913119#comment-13913119 ] 

Lars Hofhansl commented on HBASE-9778:
--------------------------------------

Some further observations.

When we reseek for a column we pass a KV that would be located just before the first KV for that column, in the various scanners, we then seek forward in the file until we're *past* the KV passed in, then we go back one KV discarding the current KV. So when we seek forward through the a file we'll scan every KV twice.

I'm planning to test passing a special KV so that in the scanners can tell when we're *on* the KV we're looking for. For example when looking for column we can scan forward until we see the first KV for that row, fam, col, and then we can stop. No need to need to scan one more, remember the previous, and then go back. For cases with few versions/columns that should shave off a large portion of the time. Will report back.


> Avoid seeking to next column in ExplicitColumnTracker when possible
> -------------------------------------------------------------------
>
>                 Key: HBASE-9778
>                 URL: https://issues.apache.org/jira/browse/HBASE-9778
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk.txt
>
>
> The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list.
> My idea here is to avoid the seeking if we know that there aren't many versions to skip.
> How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value < 10) we'll avoid the seek and call SKIP repeatedly.
> HBASE-9769 has some initial number for this approach:
> Interestingly it depends on which column(s) is (are) selected.
> Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds.
> Without patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.5|14.3|14.6|11.1|20.3|
> With patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.4|8.9|9.9|6.4|10.0|
> Variation here was +- 0.2s.
> So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)