You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Jonathan Hurley (JIRA)" <ji...@apache.org> on 2015/06/26 23:45:04 UTC

[jira] [Created] (AMBARI-12178) Memory Exhausted During Upgrade Of Large Cluster

Jonathan Hurley created AMBARI-12178:
----------------------------------------

             Summary: Memory Exhausted During Upgrade Of Large Cluster
                 Key: AMBARI-12178
                 URL: https://issues.apache.org/jira/browse/AMBARI-12178
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.1.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Blocker
             Fix For: 2.1.0


During an upgrade of a large cluster, the memory used by Ambari grows until it is fully consumed. This, however, only happens when the Upgrade Dialog page is open. If that popup is closed, the memory usage stays relatively constant.

The offending call is:
{code}
api/v1/clusters/perf400/upgrades/31?upgrade_groups/UpgradeGroup/status!=PENDING&fields=Upgrade/progress_percent,Upgrade/request_context,Upgrade/request_status,Upgrade/direction,upgrade_groups/UpgradeGroup,upgrade_groups/upgrade_items/UpgradeItem/status,upgrade_groups/upgrade_items/UpgradeItem/context,upgrade_groups/upgrade_items/UpgradeItem/group_id,upgrade_groups/upgrade_items/UpgradeItem/progress_percent,upgrade_groups/upgrade_items/UpgradeItem/request_id,upgrade_groups/upgrade_items/UpgradeItem/skippable,upgrade_groups/upgrade_items/UpgradeItem/stage_id,upgrade_groups/upgrade_items/UpgradeItem/status,upgrade_groups/upgrade_items/UpgradeItem/text&minimal_response=true
{code}

Based on heap dumps, the larges offenders are {{StageEnity}} and, as a result, {{byte[]}}:

{noformat}
Class Name| Objects |  Shallow Heap | Retained Heap
----------------------------------------------------
byte[]    | 351,907 | 3,147,710,224 |              
----------------------------------------------------

Class Name                                         | Objects | Shallow Heap | Retained Heap
--------------------------------------------------------------------------------------------
org.apache.ambari.server.orm.entities.StageEntity  | 192,356 |   18,466,176 | 3,075,693,136
org.apache.ambari.server.orm.entities.StageEntity_ |       0 |            0 |              
org.apache.ambari.server.orm.entities.StageEntityPK|       0 |            0 |              
--------------------------------------------------------------------------------------------
{noformat}

Each {{StageEntity}} is holding about 30k:
{noformat}
Class Name                                                                                                                                                                                                                                                                                                       | Shallow Heap | Retained Heap
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
org.apache.ambari.server.orm.entities.StageEntity @ 0x738e03260                                                                                                                                                                                                                                                  |           96 |        28,576
|- <class> class org.apache.ambari.server.orm.entities.StageEntity @ 0x64058d268                                                                                                                                                                                                                                 |            8 |             8
|- skippable java.lang.Integer @ 0x6401e9738  0                                                                                                                                                                                                                                                                  |           16 |            16
|- clusterId java.lang.Long @ 0x64026c908  2                                                                                                                                                                                                                                                                     |           24 |            24
|- requestId java.lang.Long @ 0x64026d840  31                                                                                                                                                                                                                                                                    |           24 |            24
|- _persistence_primaryKey org.eclipse.persistence.internal.identitymaps.CacheId @ 0x642ce20e0                                                                                                                                                                                                                   |           24 |            48
|- _persistence_cacheKey org.eclipse.persistence.internal.identitymaps.HardCacheWeakIdentityMap$ReferenceCacheKey @ 0x6469cf328                                                                                                                                                                                  |          104 |           136
|- request org.apache.ambari.server.orm.entities.RequestEntity @ 0x728d046e8                                                                                                                                                                                                                                     |          112 |           432
|- _persistence_listener org.eclipse.persistence.internal.descriptors.changetracking.AttributeChangeListener @ 0x72f073f20                                                                                                                                                                                       |           32 |            32
|- stageId java.lang.Long @ 0x7350c8b08  1199                                                                                                                                                                                                                                                                    |           24 |            24
|- logInfo java.lang.String @ 0x7350c8b20  /tmp/ambari                                                                                                                                                                                                                                                           |           24 |            64
|- requestContext java.lang.String @ 0x7350c8b38  Restarting DataNode on perf400-c-371.c.pramod-thangali.internal                                                                                                                                                                                                |           24 |           168
|- hostRoleCommands org.eclipse.persistence.indirection.IndirectList @ 0x738a0ceb0                                                                                                                                                                                                                               |           64 |           184
|- roleSuccessCriterias org.eclipse.persistence.indirection.IndirectList @ 0x738a0cef0                                                                                                                                                                                                                           |           64 |           184
|- commandParamsStage byte[141] @ 0x738c46cc8  {"restart_type":"rolling_upgrade","upgrade_direction":"upgrade","version":"2.2.6.0-2799","target_stack":"HDP-2.2","original_stack":"HDP-2.2"}                                                                                                                     |          160 |           160
|- hostParamsStage byte[776] @ 0x738dc16b0  {"ambari_db_rca_driver":"org.postgresql.Driver","ambari_db_rca_password":"mapred","ambari_db_rca_url":"jdbc:postgresql://perf400-a-1.c.pramod-thangali.internal/ambarirca","ambari_db_rca_username":"mapred","current_version":"2.2.0.0-2041","db_driver_filenam...  |          792 |           792
|- clusterHostInfo byte[26774] @ 0x739006378  {"nimbus_hosts":["278"],"all_racks":["/default-rack:0-405"],"ambari_server_host":["perf400-a-1.c.pramod-thangali.internal"],"app_timeline_server_hosts":["138"],"hive_mysql_host":["247"],"falcon_server_hosts":["2"],"hbase_master_hosts":["2"],"accumulo_maste...|       26,792 |        26,792
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{noformat}

It appears as though a local {{Cache}} in [ActionDBAccessorImpl|https://github.com/apache/ambari/blob/94c091e280a99e07db5f3910873e70aa3c18394f/ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java#L104] is holding on these objects:
{noformat:title=Shows the cache holding onto a HostEntity which holds onto a UnitOfWork map with lots of stale entities}
Class Name                                                                                                                                                 | Ref. Objects | Shallow Heap | Ref. Shallow Heap | Retained Heap
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
java.lang.Thread @ 0x641af65b8  ambari-action-scheduler Native Stack, Thread                                                                               |           76 |          120 |             7,296 |     4,960,776
|- <Java Local> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl$$EnhancerByGuice$$dcf333e8 @ 0x640538f40                                       |           75 |          248 |             7,200 |   640,497,232
|  '- hostRoleCommandCache com.google.common.cache.LocalCache$LocalManualCache @ 0x640474b58                                                               |           75 |           16 |             7,200 |   640,496,984
|     '- localCache com.google.common.cache.LocalCache @ 0x640da1650                                                                                       |           75 |          128 |             7,200 |   640,496,968
|        '- segments com.google.common.cache.LocalCache$Segment[4] @ 0x640f27e88                                                                           |           75 |           32 |             7,200 |   640,496,840
|           |- [1] com.google.common.cache.LocalCache$Segment @ 0x6410ee3c8                                                                                |           22 |           80 |             2,112 |   151,456,800
|           |  |- table java.util.concurrent.atomic.AtomicReferenceArray @ 0x6470826f8                                                                     |           21 |           16 |             2,016 |         2,080
|           |  |  '- array java.lang.Object[512] @ 0x65dd9e088                                                                                             |           21 |        2,064 |             2,016 |         2,064
|           |  |     |- [346] com.google.common.cache.LocalCache$StrongAccessEntry @ 0x670caa3d0                                                           |            1 |           48 |                96 |     2,854,000
|           |  |     |  '- valueReference com.google.common.cache.LocalCache$StrongValueReference @ 0x670caa418                                            |            1 |           16 |                96 |     2,853,928
|           |  |     |     '- referent org.apache.ambari.server.actionmanager.HostRoleCommand @ 0x670caa430                                                |            1 |          128 |                96 |     2,853,912
|           |  |     |        '- hostEntity org.apache.ambari.server.orm.entities.HostEntity @ 0x66f876d18                                                 |            1 |          136 |                96 |     2,827,496
|           |  |     |           '- _persistence_listener org.eclipse.persistence.internal.descriptors.changetracking.AttributeChangeListener @ 0x66f89f530|            1 |           32 |                96 |            32
|           |  |     |              '- uow org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork @ 0x670ca0b30                               |            1 |          360 |                96 |     2,826,496
|           |  |     |                 '- identityMapAccessor org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor @ 0x66f7fbf38        |            1 |           24 |                96 |     2,825,688
|           |  |     |                    '- identityMapManager org.eclipse.persistence.internal.identitymaps.IdentityMapManager @ 0x670c2b320             |            1 |           48 |                96 |     2,825,664
|           |  |     |                       '- identityMaps java.util.HashMap @ 0x670c2b350                                                               |            1 |           48 |                96 |     2,824,208
|           |  |     |                          '- table java.util.HashMap$Node[32] @ 0x670cb1608                                                          |            1 |          144 |                96 |     2,824,160
|           |  |     |                             '- [5] java.util.HashMap$Node @ 0x670b71bd8                                                             |            1 |           32 |                96 |     1,201,192
|           |  |     |                                '- value org.eclipse.persistence.internal.identitymaps.UnitOfWorkIdentityMap @ 0x670c5a390           |            1 |           32 |                96 |     1,201,160
|           |  |     |                                   '- cacheKeys java.util.HashMap @ 0x670c2b4d0                                                      |            1 |           48 |                96 |     1,201,128
|           |  |     |                                      '- table java.util.HashMap$Node[4096] @ 0x66f7c83c8                                            |            1 |       16,400 |                96 |     1,201,080
|           |  |     |                                         '- [3271] java.util.HashMap$Node @ 0x670c772e8                                              |            1 |           32 |                96 |           200
|           |  |     |                                            '- value org.eclipse.persistence.internal.identitymaps.CacheKey @ 0x66f756e30            |            1 |           96 |                96 |            96
|           |  |     |                                               '- object org.apache.ambari.server.orm.entities.StageEntity @ 0x66f4f6f98             |            1 |           96 |                96 |           568
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)