You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2008/08/01 22:25:31 UTC

[jira] Updated: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

     [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-792:
------------------------

    Attachment: 792.patch

My fix for HBASE-751 introduced this issue, thinking on it.  With this in place, it could get to a place where every request for closestAtOrBefore could end up loading all that is out on the filesystem, all of the flushes.

We still need to rewrite this stuff; the number of seeks done per closestAtOrBefore can be astronomical but this patch takes off some of the heat.

This patch narrows the number of possible candidates that come back.  

It goes first to the memcache to find candidate rows.

While there, it puts any deletes found between ultimate candidate and desired row into new delete Set.  This delete set is then carried down through the walk of store files.  We add new deletes as we encounter them so that candidates in older store files don't shine through if they've been deleted earlier.

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.