You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Jianfeng Jia (JIRA)" <ji...@apache.org> on 2017/09/05 23:10:00 UTC

[jira] [Commented] (ASTERIXDB-2081) Failed to restart after hit an OOM issue

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154534#comment-16154534 ] 

Jianfeng Jia commented on ASTERIXDB-2081:
-----------------------------------------

And if anyone knows a hack solution to restart the service, e.g., deleting the txn log, it will be much appreciated!

> Failed to restart after hit an OOM issue
> ----------------------------------------
>
>                 Key: ASTERIXDB-2081
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2081
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: STO - Storage
>         Environment: master
>            Reporter: Jianfeng Jia
>
> One of the node was failed due to the OOM error. Then when we try to restart the service, the node couldn't be recovered and the logs is shown as below:
> {code}
> WARNING: Error in application message delivery!
> java.lang.IllegalStateException: Failed to redo
>     at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:712)
>     at org.apache.asterix.app.nc.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:378)
>     at org.apache.asterix.app.nc.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:187)
>     at org.apache.asterix.app.nc.RecoveryManager.startLocalRecovery(RecoveryManager.java:179)
>     at org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:43)
>     at org.apache.asterix.app.replication.message.StartupTaskResponseMessage.handle(StartupTaskResponseMessage.java:53)
>     at org.apache.asterix.messaging.NCMessageBroker.receivedMessage(NCMessageBroker.java:92)
>     at org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:54)
>     at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Cannot allocate dataset 245 memory since memory budget would be exceeded.
>     at org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:566)
>     at org.apache.hyracks.storage.common.buffercache.ResourceHeapBufferAllocator.reserveAllocation(ResourceHeapBufferAllocator.java:53)
>     at org.apache.hyracks.storage.am.lsm.common.impls.VirtualBufferCache.open(VirtualBufferCache.java:307)
>     at org.apache.hyracks.storage.am.lsm.common.impls.MultitenantVirtualBufferCache.open(MultitenantVirtualBufferCache.java:119)
>     at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.allocateMemoryComponent(LSMBTree.java:602)
>     at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.allocateMemoryComponents(AbstractLSMIndex.java:386)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:417)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:364)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceUpsert(LSMTreeIndexAccessor.java:181)
>     at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:707)
>     ... 8 more
> Sep 05, 2017 3:37:46 PM org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> WARNING: Exception while executing ApplicationMessage: nodeID: 4
> java.lang.RuntimeException: java.lang.IllegalStateException: Failed to redo
>     at org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:60)
>     at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Caused by: java.lang.IllegalStateException: Failed to redo
>     at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:712)
>     at org.apache.asterix.app.nc.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:378)
>     at org.apache.asterix.app.nc.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:187)
>     at org.apache.asterix.app.nc.RecoveryManager.startLocalRecovery(RecoveryManager.java:179)
>     at org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:43)
>     at org.apache.asterix.app.replication.message.StartupTaskResponseMessage.handle(StartupTaskResponseMessage.java:53)
>     at org.apache.asterix.messaging.NCMessageBroker.receivedMessage(NCMessageBroker.java:92)
>     at org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:54)
>     ... 1 more
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Cannot allocate dataset 245 memory since memory budget would be exceeded.
>     at org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:566)
>     at org.apache.hyracks.storage.common.buffercache.ResourceHeapBufferAllocator.reserveAllocation(ResourceHeapBufferAllocator.java:53)
>     at org.apache.hyracks.storage.am.lsm.common.impls.VirtualBufferCache.open(VirtualBufferCache.java:307)
>     at org.apache.hyracks.storage.am.lsm.common.impls.MultitenantVirtualBufferCache.open(MultitenantVirtualBufferCache.java:119)
>     at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.allocateMemoryComponent(LSMBTree.java:602)
>     at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.allocateMemoryComponents(AbstractLSMIndex.java:386)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:417)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:364)
>     at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceUpsert(LSMTreeIndexAccessor.java:181)
>     at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:707)
>     ... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)