You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/07/02 22:55:47 UTC

[jira] Created: (HBASE-1607) Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache

Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache
-----------------------------------------------------------------------------------

                 Key: HBASE-1607
                 URL: https://issues.apache.org/jira/browse/HBASE-1607
             Project: Hadoop HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.20.0
            Reporter: Jonathan Gray
            Assignee: Jonathan Gray
             Fix For: 0.20.0


MemStore sizing is inaccurate and does not include all overhead.

I'm going to make it look like the LruBlockCache does.  Will provide a MemStore.heapSize() method that includes ALL overhead of the MemStore itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1607) Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726713#action_12726713 ] 

Jonathan Gray commented on HBASE-1607:
--------------------------------------

MemStore is now public so that I can test it from TestHeapSize.  Otherwise could add a separate TestHeapSize inside regionserver package?

> Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-1607
>                 URL: https://issues.apache.org/jira/browse/HBASE-1607
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1607-v1.patch
>
>
> MemStore sizing is inaccurate and does not include all overhead.
> I'm going to make it look like the LruBlockCache does.  Will provide a MemStore.heapSize() method that includes ALL overhead of the MemStore itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1607) Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1607:
---------------------------------

    Status: Patch Available  (was: Open)

Ready for review.  Patch kinda adds a lot but does not touch any major codepath outside of MemStore.heapSize -> MemStore.heapSizeChange and the constants in that.

All the heap sizing tests are passing for me on 32bit windows and 64bit linux.  Please verify io.TestHeapSize passes.

> Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-1607
>                 URL: https://issues.apache.org/jira/browse/HBASE-1607
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1607-v1.patch
>
>
> MemStore sizing is inaccurate and does not include all overhead.
> I'm going to make it look like the LruBlockCache does.  Will provide a MemStore.heapSize() method that includes ALL overhead of the MemStore itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1607) Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1607:
---------------------------------

    Attachment: HBASE-1607-v1.patch

Patch does what's described in jira description.

Two things to note:

- MemStore, Store, and HRegion now all implement HeapSize.  This is not really being used yet but this is how LruBlockCache works and will allow for us to have intelligent load balancing, control of memory usage, etc.

- The Store itself is still tracking the size of the keys that have been added into the MemStore, rather than querying MemStore.heapSize().  I don't see another way, for now, because it decrements at the right time based on the snapshot and flush.  So MemStore.heapSize() is currently unused.

So what this patch has actually done is make the incremental sizing of each addition to MemStore more accurate.  And so that any changes to objects involved will throw errors in tests if overheads were not updated accordingly.

> Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-1607
>                 URL: https://issues.apache.org/jira/browse/HBASE-1607
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1607-v1.patch
>
>
> MemStore sizing is inaccurate and does not include all overhead.
> I'm going to make it look like the LruBlockCache does.  Will provide a MemStore.heapSize() method that includes ALL overhead of the MemStore itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1607) Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1607:
---------------------------------

    Attachment: HBASE-1607-v2.patch

Removes any previous changes to HTable.

Updates ConcurrentSkipListMap entry size according to research by erik.

> Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-1607
>                 URL: https://issues.apache.org/jira/browse/HBASE-1607
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1607-v1.patch, HBASE-1607-v2.patch
>
>
> MemStore sizing is inaccurate and does not include all overhead.
> I'm going to make it look like the LruBlockCache does.  Will provide a MemStore.heapSize() method that includes ALL overhead of the MemStore itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1607) Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726747#action_12726747 ] 

stack commented on HBASE-1607:
------------------------------

Remove rather than comment out stuff.

Is ClassSize utility to estimate a Class size or instance of a Class size?  If latter, should be renamed.

All these DEEP_OVERHEAD and FIXED_OVERHEAD defines make me nervous.  Would seem to be very brittle and hard to make changes.  Where are they used?  In test only?











> Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-1607
>                 URL: https://issues.apache.org/jira/browse/HBASE-1607
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1607-v1.patch
>
>
> MemStore sizing is inaccurate and does not include all overhead.
> I'm going to make it look like the LruBlockCache does.  Will provide a MemStore.heapSize() method that includes ALL overhead of the MemStore itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1607) Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1607:
-------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Reviewed and ran tests.  Thanks for patch Jon.

> Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-1607
>                 URL: https://issues.apache.org/jira/browse/HBASE-1607
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1607-v1.patch, HBASE-1607-v2.patch
>
>
> MemStore sizing is inaccurate and does not include all overhead.
> I'm going to make it look like the LruBlockCache does.  Will provide a MemStore.heapSize() method that includes ALL overhead of the MemStore itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1607) Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726754#action_12726754 ] 

Jonathan Gray commented on HBASE-1607:
--------------------------------------

Did I comment out stuff?  Didn't think I had.  This patch might remove previously commented out stuff.

ClassSize is a utility to help us size classes.  It generates the sizes for the different native java classes that we use in heap sizable classes.  It does nothing with an instantiated class.


I agree that FIXED_OVERHEAD and DEEP_OVERHEAD are not necessarily the best long term solution out there :)  Currently in trunk, our MemStore is sized with this beautiful line:

{noformat}
private final static int ESTIMATED_KV_HEAP_TAX = 60;
  long heapSize(final KeyValue kv, final boolean notpresent) {
    return notpresent?
      // Add overhead for value byte array and for Map.Entry -- 57 bytes
      // on x64 according to jprofiler.
      ESTIMATED_KV_HEAP_TAX + 57 + kv.getLength(): 0; // Guess no change in size.
  }
{noformat}

So this is a vast improvement.  It might look nasty but it's not a hard-coded number, rather it is calculated.  Interestingly, 57 bytes is not a valid size for something in memory because everything is aligned (4 byte on 32bit, 8 byte on 64bit).

Again, I set out with two goals.  Make sizing as accurate as possible, and make tests so that if any of our sized classes change the tests will fail.  This has been done with this patch and those before it.

So FIXED_OVERHEAD might look confusing, but it's principled and shows where the sizing is coming from.  When the unit test fails, ClassSize outputs it's sizing in debug mode so you can very easily see exactly where you missed.  Message is something like:  Expected <104> but got <96>.  And there is output of how many references, primitives, etc it found.  Quite simple to fix.  If you want to play with it, just modify one of the FIXED_OVERHEADs and run TestHeapSize.

This is not a permanent solution, perhaps, but works well and is far better than the hard coded values we have now.

> Redo MemStore heap sizing to be accurate, testable, and more like new LruBlockCache
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-1607
>                 URL: https://issues.apache.org/jira/browse/HBASE-1607
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1607-v1.patch
>
>
> MemStore sizing is inaccurate and does not include all overhead.
> I'm going to make it look like the LruBlockCache does.  Will provide a MemStore.heapSize() method that includes ALL overhead of the MemStore itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.