You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/06/29 23:02:47 UTC

[jira] Created: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
----------------------------------------------------------------

                 Key: HBASE-1590
                 URL: https://issues.apache.org/jira/browse/HBASE-1590
             Project: Hadoop HBase
          Issue Type: Improvement
    Affects Versions: 0.20.0
            Reporter: Jonathan Gray
             Fix For: 0.20.0


As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.

For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.

This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "Erik Holstad (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726234#action_12726234 ] 

Erik Holstad commented on HBASE-1590:
-------------------------------------

Have been working on a deepClassSize and have a working version of it. There are a couple of things that makes the whole concept of checking the size of a class rather than 
an object hard. Let's take the TreeMap as an example, it has a reference to the root entry which in has references to entry left, right and parent, so how do you know when to stop?

>From the two main goals we already have 1 so we have 2 left.
One thing we could do, is to lift some test code using Instrumention.getObjectSize() into some test, so we don't have to include the jar. The problem is then how we should run it, since it requires  -javaagent:/home/erik/src/tgzs/SizeOf.jar at the moment. Will see if I can work around this, to be able to use in unit test.


> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725440#action_12725440 ] 

Jonathan Gray commented on HBASE-1590:
--------------------------------------

I'm not sure we need to do this anymore.  Patch going in for HBASE-1591 cleaned up LruBlockCache heapsizing and it works well now and is accurate.

Remaining issues are... 

- How do we really ensure sizing of the protected members of things like ConcurrentHashMap (Entry and Segment).  Can use SizeOf but would rather try to do some hackery/reflection business so we can dig in with ClassSize.
- Review of MemStore heapSize implementation... Same issue as above for ConcurrentSkipListMap

> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725469#action_12725469 ] 

Jonathan Gray commented on HBASE-1590:
--------------------------------------

Let's keep it open, would like to get Erik's input tomorrow.

We need to address above 2 issues, this is as fine a place as any.

> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "Erik Holstad (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726272#action_12726272 ] 

Erik Holstad commented on HBASE-1590:
-------------------------------------

@Nitay,  yes that would work if we were checking sizes of Objects, but now we are just dealing with classes so it is very hard to take that approach.

@Stack,  yup it is GPL, I just wasn't sure how you would add specific -jvm arguments to Hudson, been trying to get it to work from within eclipse without setting the arguments but without any luck so far. Seems like there are some tools that ship with the sun version but not with Openjdk until 7. So if we can run it with the arguments for now would be really nice.



> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1590:
---------------------------------

    Fix Version/s:     (was: 0.20.0)
                   0.20.1

Punting to 0.20.1 ... Doing something for this will be useful but let's not hold up 0.20.0

What needs to be done for 0.20.0 will now be handled over in HBASE-1607

> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.1
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reopened HBASE-1590:
--------------------------


Mind if I keep this open JG?  I think it'd be sweet integrating the sizeof  jar fetching it if user asks for it.  Maybe when we move build to ivy it wouldn't be too hard.

> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.21.0
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray resolved HBASE-1590.
----------------------------------

    Resolution: Won't Fix
      Assignee: Jonathan Gray

All testing on 0.20 shows we are more than okay w.r.t. our HeapSizing.  Will open a new issue against 0.21 if we do need any further improvements.

> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1590:
-------------------------

         Priority: Minor  (was: Major)
    Fix Version/s:     (was: 0.20.1)
                   0.21.0
         Assignee: stack  (was: Jonathan Gray)

Assigned me, moved to 0.21 and made it trivial.

> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.21.0
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726260#action_12726260 ] 

stack commented on HBASE-1590:
------------------------------

@holstad is Instrumention.getObjectSize() a sizeof call?  SizeOf is GPL, right?  Let me know if you want me to work on build to add like we have for clover where you point at a sizeof install and then run an ant task with -javaagent:/home/erik/src/tgzs/SizeOf.jar.  We could run this as part of hudson build (I think -- maybe GPL code is disallowed up on hudson ... would have to see)... or we could run it as part of release.

> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726239#action_12726239 ] 

Nitay Joffe commented on HBASE-1590:
------------------------------------

What if you maintain a Set<Object> of references that have been counted already. That way you can traverse any data structure and check if you need to recurse. For example, when you get to the 'parent' reference you'll see it has already been counted so you just count the reference itself without recursing into it. 

> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1590) Extend TestHeapSize and ClassSize to do "deep" sizing of Objects

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725458#action_12725458 ] 

stack commented on HBASE-1590:
------------------------------

Move it out of 0.20.0?

> Extend TestHeapSize and ClassSize to do "deep" sizing of Objects
> ----------------------------------------------------------------
>
>                 Key: HBASE-1590
>                 URL: https://issues.apache.org/jira/browse/HBASE-1590
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> As discussed in HBASE-1554 there is a bit of a disconnect between how ClassSize calculates the heap size and how we need to calculate heap size in our implementations.
> For example, the LRU block cache can be sized via ClassSize, but it is a shallow sizing.  There is a backing ConcurrentHashMap that is the largest memory consumer.  However, ClassSize only counts that as a single reference.  But in our heapSize() reporting, we want to include *everything* within that Object.
> This issue is to resolve that dissonance.  We may need to create an additional ClassSize.estimateDeep(), we may need to rethink our HeapSize interface, or maybe just leave it as is.  The two primary goals of all this testing is to 1) ensure that if something is changed and the sizing is not updated, our tests fail, and 2) ensure our sizing is as accurate as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.