You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Misha Dmitriev (JIRA)" <ji...@apache.org> on 2016/10/19 00:23:59 UTC

[jira] [Comment Edited] (HBASE-10656) high-scale-lib's Counter depends on Oracle (Sun) JRE, and also has some bug

    [ https://issues.apache.org/jira/browse/HBASE-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587146#comment-15587146 ] 

Misha Dmitriev edited comment on HBASE-10656 at 10/19/16 12:23 AM:
-------------------------------------------------------------------

To be more specific, our RegionServers end up with millions of Counter$Cell objects in memory:

{code}
 #instances     Shallow size     #instances    Shallow size       Class name
  garbage       garbage           live         live                         
----------------------------------------------------------------------------
 2,985,951   396,571K (32.8%)      766,919    101,856K (8.4%)     org.apache.hadoop.hbase.util.Counter$Cell
 2,985,949     69,983K (5.8%)      766,918     17,974K (1.5%)     org.apache.hadoop.hbase.util.Counter$Cell[]
{code}

I think there is no point in talking about optimizations where we forcefully prevent two Cell objects from sharing a single cache line where there are so many of them that they just cause memory blowup.

A simple way of solving this problem would be to just remove the extra padding long fields. However, I am totally new to HBase and don't know whether a large number of these objects is always the case. Maybe in some setups there are very few of them. In that case, maybe it would make sense to have two alternative implementations of Cell:
 - one that assumes a small number of objects and optimized for cache speed, as now
 - another that's just as compact as possible.


was (Author: misha@cloudera.com):
To be more specific, our RegionServers end up with millions of Counter$Cell objects in memory:

{code}
 #instances     Shallow size     #instances    Shallow size       Class name
  garbage       garbage           live         live                         
----------------------------------------------------------------------------
 2,985,951   396,571K (32.8%)      766,919    101,856K (8.4%)     org.apache.hadoop.hbase.util.Counter$Cell
 2,985,949     69,983K (5.8%)      766,918     17,974K (1.5%)     org.apache.hadoop.hbase.util.Counter$Cell[]
{/code}

I think there is no point in talking about optimizations where we forcefully prevent two Cell objects from sharing a single cache line where there are so many of them that they just cause memory blowup.

A simple way of solving this problem would be to just remove the extra padding long fields. However, I am totally new to HBase and don't know whether a large number of these objects is always the case. Maybe in some setups there are very few of them. In that case, maybe it would make sense to have two alternative implementations of Cell:
 - one that assumes a small number of objects and optimized for cache speed, as now
 - another that's just as compact as possible.

>  high-scale-lib's Counter depends on Oracle (Sun) JRE, and also has some bug
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-10656
>                 URL: https://issues.apache.org/jira/browse/HBASE-10656
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Hiroshi Ikeda
>            Assignee: Hiroshi Ikeda
>            Priority: Minor
>             Fix For: 0.96.2, 0.98.1, 0.99.0
>
>         Attachments: 10656-098.v2.txt, 10656-trunk.v2.patch, 10656.096v2.txt, HBASE-10656-0.96.patch, HBASE-10656-addition.patch, HBASE-10656-trunk.patch, MyCounter.java, MyCounter2.java, MyCounter3.java, MyCounterTest.java, MyCounterTest.java, PerformanceTestApp.java, PerformanceTestApp2.java, output.pdf, output.txt, output2.pdf, output2.txt
>
>
> Cliff's high-scale-lib's Counter is used in important classes (for example, HRegion) in HBase, but Counter uses sun.misc.Unsafe, that is implementation detail of the Java standard library and belongs to Oracle (Sun). That consequently makes HBase depend on the specific JRE Implementation.
> To make matters worse, Counter has a bug and you may get wrong result if you mix a reading method into your logic calling writing methods.
> In more detail, I think the bug is caused by reading an internal array field without resolving memory caching, which is intentional the comment says, but storing the read result into a volatile field. That field may be not changed after you can see the true values of the array field, and also may be not changed after updating the "next" CAT instance's values in some race condition when extending CAT instance chain.
> Anyway, it is possible that you create a new alternative class which only depends on the standard library. I know Java8 provides its alternative, but HBase should support Java6 and Java7 for some time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)