You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/02/09 19:24:59 UTC

[jira] Created: (HBASE-1192) LRU-style map for the block cache

LRU-style map for the block cache
---------------------------------

                 Key: HBASE-1192
                 URL: https://issues.apache.org/jira/browse/HBASE-1192
             Project: Hadoop HBase
          Issue Type: New Feature
          Components: regionserver
            Reporter: Jonathan Gray
            Priority: Blocker
             Fix For: 0.20.0


We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1192) LRU-style map for the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray reassigned HBASE-1192:
------------------------------------

    Assignee: Jonathan Gray

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1192) LRU-style map for the block cache

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-1192:
-------------------------------

    Attachment: HBASE-1192-ryan.patch

integrates jgray's previous LRU class and make it work and integrated with hfile.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1192-ryan.patch, hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1192) LRU-style map for the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1192:
---------------------------------

    Attachment: hbase-1192-v1.patch

This takes the baseline LruHashMap from HBASE-1186 and wraps it in a LruBlockCache class which implements both HeapSize and the new HFile interface BlockCache.

The code is not well tested but wanted to get something up asap to start tinkering.

As discussed on IRC, block caching is no longer optional across the board.

Note:  this patch creates the necessary files in the working directory and not the hbase src tree because we're still doing file format testing outside of trunk

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705343#action_12705343 ] 

stack commented on HBASE-1192:
------------------------------

@Shalin

Thank you for the pointer to the rich lode on lock-free cache.

@Jon

What you think?  Thread scheduler is not fair in java.  Synchronizations could make for thread pile-ups with some left out in the cold never getting a look-in (the 'bottlenecks', I presume, referenced in the issue).  Could push out our 99.5th percentile.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Erik Holstad (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671960#action_12671960 ] 

Erik Holstad commented on HBASE-1192:
-------------------------------------

In the instructions for SoftReferences you can find:
"All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError. Otherwise no constraints are placed upon the time at which a soft reference will be cleared or the order in which a set of such references to different objects will be cleared. Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references."

Which would mean that the behavior would change from JVM to JVM, so the Lruness of the system is clearly questionable.


Ran a test where I used SoftSortedValueMap and inserted 10M entries into it, all entries were the same size.
i is the entry currently inserted

size of map before 1285540
size of deletes 1285537
size of map after 3
i 1285540

size of map before 1287469
size of deletes 443
size of map after 1287026
i 2573006

size of map before 1287027
size of deletes 2075
size of map after 1284952
i 2573007

size of map before 1284953
size of deletes 692
size of map after 1284261
i 2573008

size of map before 1284262
size of deletes 624
size of map after 1283638
i 2573009

size of map before 1283639
size of deletes 650
size of map after 1282989
i 2573010

size of map before 1282990
size of deletes 672
size of map after 1282318
i 2573011

size of map before 1282319
size of deletes 632
size of map after 1281687
i 2573012

size of map before 1281688
size of deletes 1281679
size of map after 9
i 2573013

size of map before 1285424
size of deletes 1285423
size of map after 1
i 3858428

size of map before 1286746
size of deletes 1286745
size of map after 1
i 5145173

size of map before 1285660
size of deletes 1285659
size of map after 1
i 6430832

size of map before 1286804
size of deletes 1286802
size of map after 2
i 7717635

size of map before 1286745
size of deletes 1286744
size of map after 1
i 9004378

So one can see that sometimes it evicts a some 100 entries for every insert and other times
it basically empties the whole map, but waits long in between.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705540#action_12705540 ] 

Noble Paul commented on HBASE-1192:
-----------------------------------

bq.. I'd like to try to keep our initial implementation as simple as we can, so will probably take hints from their implementation but write our own. 

hi Jon, I am co-author of the Solr cache implementation . The code is slightly complex because it is written for efficiency. But you can easily copy the java file as is and make small modifications to suit your needs.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1192) LRU-style map for the block cache

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson reassigned HBASE-1192:
----------------------------------

    Assignee: ryan rawson  (was: Jonathan Gray)

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671982#action_12671982 ] 

Jim Kellerman commented on HBASE-1192:
--------------------------------------

Looks good so far.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705114#action_12705114 ] 

Shalin Shekhar Mangar commented on HBASE-1192:
----------------------------------------------

Can someone elaborate on why a custom implementation of an LRUCache is being made? I guess you can achieve memory-awareness by extending or wrapping a LinkedHashMap.

Also, is the synchronized get/put alright for this use-case? Solr had some issues with synchronized LRUCache when the cache was hit thousands of times per second. SOLR-667 has a good implementation that you can look at.

I do not have the complete background so I may have missed something.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672858#action_12672858 ] 

Andrew Purtell commented on HBASE-1192:
---------------------------------------

+1

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1192) LRU-style map for the block cache

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1192:
----------------------------------

    Status: Open  (was: Patch Available)

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1192) LRU-style map for the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1192:
---------------------------------

    Status: Patch Available  (was: Open)

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671971#action_12671971 ] 

Jean-Daniel Cryans commented on HBASE-1192:
-------------------------------------------

+1 for the nifty LRU map.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705242#action_12705242 ] 

Shalin Shekhar Mangar commented on HBASE-1192:
----------------------------------------------

Thanks Jonathan.

bq. On the inside, it is very similar to a LinkedHashMap, just customized to be memory-aware making use of our HeapSize interface. It turned out to be much easier and more efficient to work with the data structures directly.

Sure, if you find it easier that way, that's fine.

{quote}It is possible we will run in to contention issues. If we saw issues, my plan was to add buckets a la ConcurrentHashMap.

Looking at the solr implementation, that looks like what you guys did! Very cool.{quote}

This is what Solr has in trunk right now:
http://svn.apache.org/viewvc/lucene/solr/trunk/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java?view=markup

Also see SOLR-1082 where an ehcache based implementation was also discussed.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671975#action_12671975 ] 

stack commented on HBASE-1192:
------------------------------

Great work lads. +1 on the direction.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705118#action_12705118 ] 

Jonathan Gray commented on HBASE-1192:
--------------------------------------

Shalin,

On the inside, it is very similar to a LinkedHashMap, just customized to be memory-aware making use of our HeapSize interface.  It turned out to be much easier and more efficient to work with the data structures directly.

It is possible we will run in to contention issues.  If we saw issues, my plan was to add buckets a la ConcurrentHashMap.

Looking at the solr implementation, that looks like what you guys did!  Very cool.

I will read up on the issue and patch.  Thanks Shalin!

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1192) LRU-style map for the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1192:
---------------------------------

    Attachment: hbase-1192-v3.patch

This patch moves LruBlockCache into .io.hfile package and it also now implements the BlockCache interface

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705458#action_12705458 ] 

Jonathan Gray commented on HBASE-1192:
--------------------------------------

Hopefully we do see this much concurrency on the block cache.  I read all the solr code, it looks solid.  I'd like to try to keep our initial implementation as simple as we can, so will probably take hints from their implementation but write our own.

I am going to be working on HBase stuff second half of this week.  I will take their code for a spin and do some benchmarking/tests.  I think we should back the block cache with a concurrenthashmap, will try to get a patch up by friday.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671957#action_12671957 ] 

Jonathan Gray commented on HBASE-1192:
--------------------------------------

My proposal is to build upon the work being done in HBASE-1186 and HBASE-1188 to create our own LRU-style Map specialized for the block cache.

A few points as to why I think we should move away from SoftReferences and manage everything ourselves:

- The defined loose constraints and observed non-uniform behavior of SoftReferences
- We're already "managing" heap usage for Memcache.  Using softrefs for block cache, we'll have something that's almost a black box and trying to use all available memory.  This could make the memcache flush out itself because the RS is under heap pressure.  We won't have much control over fairness between memcaches, indexes, and the block cache if using softrefs.  I propose we build something very similar to the MemcacheFlusher thread that would deal with fairness between the different elements of the RS that uses significant heap (memcaches, indexes, block cache, cell cache, in-memory families, blooms, etc...).  As with the new file format, there's going to be more parameters in hbase 0.20 in order to optimize for your use case.  Like the file format, we'll have to come up with reasonable defaults and write more documentation about the effects of the different settings.  Do we want to divide up the total available heap on startup between the different memory consumers, do we want to leave it wide open for memcaches/indexes/blocks until we're under heap pressure and then make a decision about how to flush or evict fairly?
- Ability to implement in-memory families as described in the bigtable paper very easily by adding priority into the eviction algorithm
- Full table scans can thrash the cache (for Streamy, we do this only for MR jobs not user-facing stuff).  With our own structure, we can use a modified LRU algorithm that is resistant to table scans (i'm a fan of ARC but there's license issues; it's fairly simple to implement this if you manually configure... ARC is cool because it self-tunes).

Those are my main points.  The primary reason to not go in this direction is simplicity.  However, I think what we've learned in the past couple releases from OOME hell, we must (and already are) be in the business of heap management.  Streamy guys have done the research and development to do memory management in java as best as it seems it can be done (based on other open source java caching apps), so I'm confident we can be correct, efficient, and accurate enough to prevent oome issues and get optimal performance.

Erik will post his findings from his work experimenting with softref behavior.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1192) LRU-style map for the block cache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1192:
---------------------------------

    Attachment: hbase-1192-v2.patch

Built from latest LruHashMap, this is a specialized version that only works for <String,ByteBuffer>

This is so we don't have to have wrapping classes that implement HeapSize for the key and val (req for LruHashMap).

Needs testing but is expected to work, very few changes.

This patch has a dependency of latest HeapSize as posted in most recent HBASE-1186 v4 patch.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1192) LRU-style map for the block cache

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-1192:
-------------------------------

    Affects Version/s: 0.20.0
               Status: Patch Available  (was: Open)

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1192-ryan.patch, hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1192) LRU-style map for the block cache

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1192:
-------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Thanks for the patch Ryan and Jon.  SOLR suggested improvements have been moved to HBASE-1460.  Will take up my issues w/ the current patch over there.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1192-ryan.patch, hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1192) LRU-style map for the block cache

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689333#action_12689333 ] 

Andrew Purtell commented on HBASE-1192:
---------------------------------------

This is what I get when I try to apply and compile the v3 patch:
\\
\\
{code}
    [javac] /usr/src/Hadoop/hbase-trunk/src/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java:210: cannot find symbol
    [javac] symbol  : variable blockNum
    [javac] location: class org.apache.hadoop.hbase.io.hfile.LruBlockCache
    [javac]     put(blockNum,buf);
    [javac]         ^
    [javac] /usr/src/Hadoop/hbase-trunk/src/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java:1140: cannot find symbol
    [javac] symbol  : variable STRING
    [javac] location: interface org.apache.hadoop.hbase.io.HeapSize
    [javac]       return HeapSize.STRING + alignSize(s.length()*2);
    [javac]                      ^
    [javac] /usr/src/Hadoop/hbase-trunk/src/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java:1140: incompatible types
    [javac] found   : <nulltype>
    [javac] required: long
    [javac]       return HeapSize.STRING + alignSize(s.length()*2);
    [javac]                              ^
    [javac] /usr/src/Hadoop/hbase-trunk/src/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java:1148: cannot find symbol
    [javac] symbol  : variable BYTEBUFFER
    [javac] location: interface org.apache.hadoop.hbase.io.HeapSize
    [javac]       return HeapSize.BYTEBUFFER + alignSize(b.capacity());
    [javac]                      ^
    [javac] /usr/src/Hadoop/hbase-trunk/src/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java:1148: incompatible types
    [javac] found   : <nulltype>
    [javac] required: long
    [javac]       return HeapSize.BYTEBUFFER + alignSize(b.capacity());
    [javac]                                  ^
    [javac] 5 errors
{code}

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hbase-1192-v1.patch, hbase-1192-v2.patch, hbase-1192-v3.patch
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.