You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Dave Latham (JIRA)" <ji...@apache.org> on 2011/07/02 01:51:28 UTC

[jira] [Created] (HBASE-4055) Client region location caches redundant HTableDescriptor's

Client region location caches redundant HTableDescriptor's
----------------------------------------------------------

                 Key: HBASE-4055
                 URL: https://issues.apache.org/jira/browse/HBASE-4055
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.90.3
            Reporter: Dave Latham
             Fix For: 0.92.0


While examining the heap of a map task in a MapReduce job that writes directly to HBase, I noticed that the HRegionLocation instances were taking up 90 MB (out of a 700 MB heap for each map task) to cache the locations for 15K regions.  As the number of regions in the cluster continues to grow, this continues to grow as well.

Of that, it appears that about 80 MB were going to 15K HTableDescriptor instances.  There are only 5 tables that it's writing to, so it seems to be wasting a great deal of memory with a separate copy of the table descriptor for each region.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4055) Client region location caches redundant HTableDescriptor's

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060191#comment-13060191 ] 

stack commented on HBASE-4055:
------------------------------

Andrew.  I think 451 in 90 too big a change (it's not finished yet on trunk).  Meantime the fix by the lads from huawei should do in the meantime.  We just need to push out 0.90.4

> Client region location caches redundant HTableDescriptor's
> ----------------------------------------------------------
>
>                 Key: HBASE-4055
>                 URL: https://issues.apache.org/jira/browse/HBASE-4055
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.3
>            Reporter: Dave Latham
>             Fix For: 0.92.0
>
>
> While examining the heap of a map task in a MapReduce job that writes directly to HBase, I noticed that the HRegionLocation instances were taking up 90 MB (out of a 700 MB heap for each map task) to cache the locations for 15K regions.  As the number of regions in the cluster continues to grow, this continues to grow as well.
> Of that, it appears that about 80 MB were going to 15K HTableDescriptor instances.  There are only 5 tables that it's writing to, so it seems to be wasting a great deal of memory with a separate copy of the table descriptor for each region.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4055) Client region location caches redundant HTableDescriptor's

Posted by "Dave Latham (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Latham resolved HBASE-4055.
--------------------------------

    Resolution: Duplicate

Ah, that should probably do it, Stack.  Thanks for the work.  Sorry for the duplicate issue.

It seems like too risky a change for the branch to me.  I look forward to seeing 0.92.

> Client region location caches redundant HTableDescriptor's
> ----------------------------------------------------------
>
>                 Key: HBASE-4055
>                 URL: https://issues.apache.org/jira/browse/HBASE-4055
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.3
>            Reporter: Dave Latham
>             Fix For: 0.92.0
>
>
> While examining the heap of a map task in a MapReduce job that writes directly to HBase, I noticed that the HRegionLocation instances were taking up 90 MB (out of a 700 MB heap for each map task) to cache the locations for 15K regions.  As the number of regions in the cluster continues to grow, this continues to grow as well.
> Of that, it appears that about 80 MB were going to 15K HTableDescriptor instances.  There are only 5 tables that it's writing to, so it seems to be wasting a great deal of memory with a separate copy of the table descriptor for each region.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4055) Client region location caches redundant HTableDescriptor's

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058877#comment-13058877 ] 

stack commented on HBASE-4055:
------------------------------

HRegionInfo no longer carries an HTableDescriptor instance in TRUNK.  Will that fix it Dave?  In branch there is HBASE-3906.  Does that help?

> Client region location caches redundant HTableDescriptor's
> ----------------------------------------------------------
>
>                 Key: HBASE-4055
>                 URL: https://issues.apache.org/jira/browse/HBASE-4055
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.3
>            Reporter: Dave Latham
>             Fix For: 0.92.0
>
>
> While examining the heap of a map task in a MapReduce job that writes directly to HBase, I noticed that the HRegionLocation instances were taking up 90 MB (out of a 700 MB heap for each map task) to cache the locations for 15K regions.  As the number of regions in the cluster continues to grow, this continues to grow as well.
> Of that, it appears that about 80 MB were going to 15K HTableDescriptor instances.  There are only 5 tables that it's writing to, so it seems to be wasting a great deal of memory with a separate copy of the table descriptor for each region.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4055) Client region location caches redundant HTableDescriptor's

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058887#comment-13058887 ] 

Andrew Purtell commented on HBASE-4055:
---------------------------------------

bq. HRegionInfo no longer carries an HTableDescriptor instance in TRUNK

Maybe we should do this in 0.90 too? ... assuming we do the right thing with actually returning valid HTDs from methods as if nothing has changed behind the curtain.

> Client region location caches redundant HTableDescriptor's
> ----------------------------------------------------------
>
>                 Key: HBASE-4055
>                 URL: https://issues.apache.org/jira/browse/HBASE-4055
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.3
>            Reporter: Dave Latham
>             Fix For: 0.92.0
>
>
> While examining the heap of a map task in a MapReduce job that writes directly to HBase, I noticed that the HRegionLocation instances were taking up 90 MB (out of a 700 MB heap for each map task) to cache the locations for 15K regions.  As the number of regions in the cluster continues to grow, this continues to grow as well.
> Of that, it appears that about 80 MB were going to 15K HTableDescriptor instances.  There are only 5 tables that it's writing to, so it seems to be wasting a great deal of memory with a separate copy of the table descriptor for each region.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira