You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Gary Tully (JIRA)" <ji...@apache.org> on 2018/10/17 10:44:00 UTC

[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown

    [ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653349#comment-16653349 ] 

Gary Tully commented on AMQ-6590:
---------------------------------

It turns out that this change, while good, means that we take a hit on start in the normal failover case where the primary dies uncleanly.

There have been reports of more than 2mins to start and the threads are stuck in sequence.set add. AMQ-7055 helps a good bit, but the problem is we are trading off availability for disk usage and taking the hit during restart.

I am thinking it may be better to do a checkpoint of the feeList when we do gc, the cleanup phase, and accept that information on restart. 

If the restart is unclean, we remember that and do a freeList recovery when we next do an orderly shutdown. In that way, we can still restart fast, lose some disk space to some missed free pages and gracefully recover when we are stopping.

[~cshannon] I wonder if that will hold together? The other approach is do have the option to do this offline, offline work has always been on the todo.

> KahaDB index loses track of free pages on unclean shutdown
> ----------------------------------------------------------
>
>                 Key: AMQ-6590
>                 URL: https://issues.apache.org/jira/browse/AMQ-6590
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.14.3
>            Reporter: Christopher L. Shannon
>            Assignee: Christopher L. Shannon
>            Priority: Major
>             Fix For: 5.15.0, 5.14.4
>
>
> I have discovered an issue with the KahaDB index recovery after an unclean shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. 
> Normally on clean shutdown the index stores the known set of free pages to db.free and reads that in on start up to know which pages can be re-used.  On an unclean shutdown this is not written to disk so on start up the index is supposed to scan the page file to figure out all of the free pages.
> Unfortunately it turns out that this scan of the page file is being done before the total page count value has been set so when the iterator is created it always thinks there are 0 pages to scan.
> The end result is that every time an unclean shutdown occurs all known free pages are lost and no longer tracked.  This of course means new free pages have to be allocated and all of the existing space is now lost which will lead to excessive index file growth over time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)