You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/10/31 19:58:01 UTC

[jira] [Commented] (ASTERIXDB-2081) Failed to restart after hit an OOM issue

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227418#comment-16227418 ] 

ASF subversion and git services commented on ASTERIXDB-2081:
------------------------------------------------------------

Commit 7ea84894b055289d46a3c4761748411574906f25 in asterixdb's branch refs/heads/master from [~mhubail]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=7ea8489 ]

[ASTERIXDB-2081][STO] Introduce DatasetMemoryManager

- user model changes: no
- storage format changes: no
- interface changes: yes
  Added IDatasetMemoryManager to manage datasets memory
  reservation and allocation.

Details:
- Reserve metadata datasets memory to allow them to be opened
  when needed.
- Add UngracefulShutdownNCApplication to force recovery
  to run on AsterixHyracksIntegrationUtil.
- Refactor the use of firstAvilableUserDatasetID to check
  for metadata datasets.
- Add ThreadSafe annotation.
- Add test case for RecoveryManager after creating multiple
  datasets.

Change-Id: Ica76b3c8eca6f7d2ad1d962fb5ef84267c258571
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2112
Sonar-Qube: Jenkins <je...@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>
Contrib: Jenkins <je...@fulliautomatix.ics.uci.edu>
Reviewed-by: Michael Blow <mb...@apache.org>
Integration-Tests: Jenkins <je...@fulliautomatix.ics.uci.edu>


> Failed to restart after hit an OOM issue
> ----------------------------------------
>
>                 Key: ASTERIXDB-2081
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2081
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: STO - Storage
>         Environment: master
>            Reporter: Jianfeng Jia
>            Assignee: Murtadha Hubail
>
> One of the node was failed due to the OOM error. Then when we try to restart the service, the node couldn't be recovered and the logs is shown as below:
> {code}
> WARNING: Error in application message delivery!
> java.lang.IllegalStateException: Failed to redo
>     at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:712)
>     at org.apache.asterix.app.nc.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:378)
>     at org.apache.asterix.app.nc.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:187)
>     at org.apache.asterix.app.nc.RecoveryManager.startLocalRecovery(RecoveryManager.java:179)
>     at org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:43)
>     at org.apache.asterix.app.replication.message.StartupTaskResponseMessage.handle(StartupTaskResponseMessage.java:53)
>     at org.apache.asterix.messaging.NCMessageBroker.receivedMessage(NCMessageBroker.java:92)
>     at org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:54)
>     at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Cannot allocate dataset 245 memory since memory budget would be exceeded.
>     at org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:566)
>     at org.apache.hyracks.storage.common.buffercache.ResourceHeapBufferAllocator.reserveAllocation(ResourceHeapBufferAllocator.java:53)
>     at org.apache.hyracks.storage.am.lsm.common.impls.VirtualBufferCache.open(VirtualBufferCache.java:307)
>     at org.apache.hyracks.storage.am.lsm.common.impls.MultitenantVirtualBufferCache.open(MultitenantVirtualBufferCache.java:119)
>     at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.allocateMemoryComponent(LSMBTree.java:602)
>     at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.allocateMemoryComponents(AbstractLSMIndex.java:386)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:417)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:364)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceUpsert(LSMTreeIndexAccessor.java:181)
>     at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:707)
>     ... 8 more
> Sep 05, 2017 3:37:46 PM org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> WARNING: Exception while executing ApplicationMessage: nodeID: 4
> java.lang.RuntimeException: java.lang.IllegalStateException: Failed to redo
>     at org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:60)
>     at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Caused by: java.lang.IllegalStateException: Failed to redo
>     at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:712)
>     at org.apache.asterix.app.nc.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:378)
>     at org.apache.asterix.app.nc.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:187)
>     at org.apache.asterix.app.nc.RecoveryManager.startLocalRecovery(RecoveryManager.java:179)
>     at org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:43)
>     at org.apache.asterix.app.replication.message.StartupTaskResponseMessage.handle(StartupTaskResponseMessage.java:53)
>     at org.apache.asterix.messaging.NCMessageBroker.receivedMessage(NCMessageBroker.java:92)
>     at org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:54)
>     ... 1 more
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Cannot allocate dataset 245 memory since memory budget would be exceeded.
>     at org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:566)
>     at org.apache.hyracks.storage.common.buffercache.ResourceHeapBufferAllocator.reserveAllocation(ResourceHeapBufferAllocator.java:53)
>     at org.apache.hyracks.storage.am.lsm.common.impls.VirtualBufferCache.open(VirtualBufferCache.java:307)
>     at org.apache.hyracks.storage.am.lsm.common.impls.MultitenantVirtualBufferCache.open(MultitenantVirtualBufferCache.java:119)
>     at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.allocateMemoryComponent(LSMBTree.java:602)
>     at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.allocateMemoryComponents(AbstractLSMIndex.java:386)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:417)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:364)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceUpsert(LSMTreeIndexAccessor.java:181)
>     at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:707)
>     ... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)