You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "jian zhang (JIRA)" <ji...@apache.org> on 2011/05/20 06:30:47 UTC

[jira] [Created] (HBASE-3906) When HMaster is running,there are a lot of HServerLoad instances（far greater than the regions）,it has risk of OOME.

When HMaster is running,there are a lot of HServerLoad instances（far greater than the regions）,it has risk of OOME.
-------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-3906
                 URL: https://issues.apache.org/jira/browse/HBASE-3906
             Project: HBase
          Issue Type: Improvement
          Components: master
    Affects Versions: 0.90.2
         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
            Reporter: jian zhang
             Fix For: 0.90.4


1、Start hbase cluster;
2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
3、Use MAT to analyse the dump file,there are too many HServerLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "jian zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jian zhang updated HBASE-3906:
------------------------------

    Attachment: HBASE-3906.patch

The 24M objects are all live,not include dead objects.
I have test senarios below with this new patch:
1,start cluster normally,insert data and then dump hmaster memory;
2,when cluster is running,kill active hmaster and standby hmaster switch to active hmaster,then dump the new active hmaster memory;
3,kill or join new regionserver to running cluster,when balance finished,dump hmaster memory.

All senarios above,the hmaster does not have unnessessary HServerLoad objects and the balance can work too.


> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch, HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037011#comment-13037011 ] 

stack commented on HBASE-3906:
------------------------------

@Ted I think the patch is for branch only. It has the problem.  I don't believe TRUNK does.

@Jian This should work though its ugly; i.e. refreshing an HServerInfo instance (Do we need to keep load in the Map of regions?  What about clearing the load from the HSI we add to the Map of regions to HSI?  Would that work?  Or is this Map used balancing?).  Does your patch work for you?  Any issues w/ the new synchronize blocks?

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "jian zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jian zhang updated HBASE-3906:
------------------------------

    Attachment:     (was: HBASE-3906.patch)

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch, HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "jian zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jian zhang updated HBASE-3906:
------------------------------

    Attachment: HBASE-3906.patch

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "jian zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jian zhang updated HBASE-3906:
------------------------------

    Affects Version/s: 0.90.3

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038061#comment-13038061 ] 

stack commented on HBASE-3906:
------------------------------

Jian: Yes, Andrew is asking how many of the 24M objects are not collectable by the JVM?  Does your heap analysis tool have a means of cleaning dead objects and only showing 'live objects'?

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037257#comment-13037257 ] 

Andrew Purtell commented on HBASE-3906:
---------------------------------------

How many of those "3G" of objects on the heap are live?

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3906.
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed to branch (Doesn't make sense on TRUNK).  Thanks for the patch Jian.

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch, HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "jian zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jian zhang updated HBASE-3906:
------------------------------

    Attachment: HBASE-3906.patch

please use this new attachement.

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch, HBASE-3906.patch, HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "jian zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037816#comment-13037816 ] 

jian zhang commented on HBASE-3906:
-----------------------------------

1, Ted, This patch is only for branch.
2, Andrew, In my hmaster dump, there are 1481 HServerInfo and HServerLoad objects,24,423,058 RegionLoad objects,one RegionLoad occupy 136B.I'm not native speaker,so i'm not very sure that i understand your question correctly. Can i understand "live objects" as the objects which cann't be garbage collected by jvm?if so,i think all these objects are live.
3,stack, I tested serveral senarios,the patch can work correctly,no issues found about synchronize blocks.
Indeed,refreshing hserverinfo is not grace enough. and balancing don't use the load of the HSI in regions map i think. according to your suggestion, i cleared the load and patched on my cluster to test,until now,it works ok.i will try to test more senarios and then provide the new patch to you for reviewing again.
BTW,one hserverinfo object occupy about 350B memory though cleared the load,if we don't use my ugly refreshing solution, in worst case,one region need one hserverinfo object,if a big hbase cluster have 500,000 regions,the hserverinfo objects will occupy about 175,000,000B memory.do you think this can be acceptable?


> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038059#comment-13038059 ] 

stack commented on HBASE-3906:
------------------------------

Jian:  Thanks for trying out my suggestion.  I think 175M is fine if you have 500k regions.

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "jian zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jian zhang updated HBASE-3906:
------------------------------

    Description: 
1、Start hbase cluster;
2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

  was:
1、Start hbase cluster;
2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
3、Use MAT to analyse the dump file,there are too many HServerLoad instances,and these instances occupy more than 3G memory;

        Summary: When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.  (was: When HMaster is running,there are a lot of HServerLoad instances（far greater than the regions）,it has risk of OOME.)

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037000#comment-13037000 ] 

Ted Yu commented on HBASE-3906:
-------------------------------

The patch wouldn't apply to trunk where heart beat has been removed.

> When HMaster is running,there are a lot of RegionLoad instances（far greater than the regions）,it has risk of OOME.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3906
>                 URL: https://issues.apache.org/jira/browse/HBASE-3906
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 hmaster,4 regionserver,about 100,000 regions.
>            Reporter: jian zhang
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3906.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1、Start hbase cluster;
> 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
> 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira