You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Raymond Liu (JIRA)" <ji...@apache.org> on 2013/03/06 03:52:13 UTC

[jira] [Updated] (HBASE-8001) Avoid unnecessary lazy seek

     [ https://issues.apache.org/jira/browse/HBASE-8001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raymond Liu updated HBASE-8001:
-------------------------------

    Attachment: HBASE-8001_onescanner.patch

In HBase-8001_onescanner patch, the idea is that when there are only one scanner left, then there is no need for lazy seek. a real seek should always been performed. This is the most simple and easy case can be detected automatically. And it involves the possibility of switch from lazy seek to non lazy reseek, this requires that the scanner is reseekable, there might be multiple place to check this condition. I choose to do it in the scanner level in this patch. it could be done in heap level, while require extra API from scanner.

Anyway, this is my first thought on how to reduce unnecessary lazy seek. When benchmark this on a single hfile region, the overall table scan throughput can be improved by 8-10%.
                
> Avoid unnecessary lazy seek
> ---------------------------
>
>                 Key: HBASE-8001
>                 URL: https://issues.apache.org/jira/browse/HBASE-8001
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.5
>            Reporter: Raymond Liu
>            Assignee: Raymond Liu
>             Fix For: 0.98.0
>
>         Attachments: HBASE-8001_onescanner.patch
>
>
> Lazy seek helps to reduce the real seek needed for multi hfile, when the kv from newer hfile is enough to satisfy the query.
> While in many case, it just push the real seek later, and do not reduce the number of real seek. e.g. there are only one hfile, or storefilescanner is closed and only one left, or the scan need to go through all the versions, all there are only one version of row and a sequence scan is performed. In these case, lazy seek just bring extra overhead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira