You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2008/08/01 04:56:31 UTC

[jira] Created: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
--------------------------------------------------------------------

                 Key: HBASE-792
                 URL: https://issues.apache.org/jira/browse/HBASE-792
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack


As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.

As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.

So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.

Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-792:
------------------------

    Fix Version/s:     (was: 0.20.0)

Moving out of 0.20.0.  Not going to happen unless its already done as part of hbase-1304 (haven't heard).

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-792:
------------------------

    Fix Version/s:     (was: 0.21.0)
                   0.20.0

I think this is being done in 0.20.0 as part of major refactor.  Bringing in.

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744183#action_12744183 ] 

stack commented on HBASE-792:
-----------------------------

HBASE-1761 rewrote getclosestatorbefore.  Code is much cleaner and more focused on the target key.  We can do this now because of such as the axiom that deletes only apply to the flie that follows.  Doesn't carry around bulky Maps of candidates nor of deletes (now we have new style deletes) any more so should be more performant.

The one thing left to do is an early-out if we get an answer early in the processing -- in memstore say.  I tried to do this as part of hbase-1761 but only worked if client asked for the first row in a region. Need to make it so getclosest when a meta table leverages HRegionInfo.  If target row key falls between the start and end key or the region, answer is the one we want so exit.

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-792:
------------------------

    Fix Version/s:     (was: 0.20.0)
                   0.21.0

Moving it out.  KeyValue changes plus caching may put need for this off a while.

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-792:
------------------------

    Attachment: 792.patch

My fix for HBASE-751 introduced this issue, thinking on it.  With this in place, it could get to a place where every request for closestAtOrBefore could end up loading all that is out on the filesystem, all of the flushes.

We still need to rewrite this stuff; the number of seeks done per closestAtOrBefore can be astronomical but this patch takes off some of the heat.

This patch narrows the number of possible candidates that come back.  

It goes first to the memcache to find candidate rows.

While there, it puts any deletes found between ultimate candidate and desired row into new delete Set.  This delete set is then carried down through the walk of store files.  We add new deletes as we encounter them so that candidates in older store files don't shine through if they've been deleted earlier.

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619126#action_12619126 ] 

Jim Kellerman commented on HBASE-792:
-------------------------------------

Patch looks good +1

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-792:
------------------------

         Priority: Blocker  (was: Major)
    Fix Version/s: 0.20.0

This can actually be responsible for slowing down whole cluster (J-D saw it in 0.18 hbase up on his openspaces cluster)

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619134#action_12619134 ] 

stack commented on HBASE-792:
-----------------------------

Committed 792.patch.  Leaving issue open.  Close after we do rewrite.

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-792:
---------------------------

    Assignee: stack

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 792.patch
>
>
> As currently written, as a table gets bigger, the number of rows .META. needs to keep count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks up any row that could be a possible candidate for closest before.  It doesn't just get the closest from the storefile, but all keys that are closest before.  Its not selective because how can it tell at the store file level which of the candidates will survive deletes that are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before 7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.