You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Gary Tully (JIRA)" <ji...@apache.org> on 2009/11/05 17:33:54 UTC

[jira] Resolved: (AMQ-2478) Too many files open error, after no space left on device occurs; if producer carries on sending messages.

     [ https://issues.apache.org/activemq/browse/AMQ-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Tully resolved AMQ-2478.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 5.4.0

patch applied with thanks in r833074 - left test case with the jira as it requires a small disk. Works ok with the new kahaDB also which is good.

> Too many files open error, after no space left on device occurs; if producer carries on sending messages.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-2478
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2478
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.3.0
>         Environment: MacOSX 10.6.1, fusesource broker 5.3.0.4.
>            Reporter: Dominic Tootell
>            Assignee: Gary Tully
>             Fix For: 5.4.0
>
>         Attachments: DataFile.java, patchfile.txt, TooManyFilesTest.java
>
>
> The problem seem to be that open the persistence store (disk) has run out of space, if the producer keeps on sending messages to the broker the brokers end up eating up the file descriptors for the process (default 1024), and you get the error "too many open files".  The only way to fix this is a broker restart.
> 1) Producer is sending to the broker
> 2) Disk Space on the broker runs out
> 3) The producer gets the error:
> [2009.11.02 23:05:30] [main] INFO  ProducerTool -  Sent Message:
> [18973 : ^@^@OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO...], took: 1ms
> [2009.11.02 23:05:30] [main] WARN  ProducerTool -  Error sending
> message:18974 : ^@^@OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO...
> javax.jms.JMSException: No space left on device
>       at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:49)
>       at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1255)
> 4)  The broker gets the error:
> {code}
> DEBUG Service                        - Error occured while processing
> async command: MessageAck {commandId = 53297, responseRequired =
> false, ackType = 2, consumerId =
> ID:dominic-tootells-macbook-pro.local-57138-1257203010059-0:0:-1:2,
> firstMessageId =
> ID:dominic-tootells-macbook-pro.local-57143-1257203033952-0:0:1:1:17751,
> lastMessageId =
> ID:dominic-tootells-macbook-pro.local-57143-1257203033952-0:0:1:1:17751,
> destination = queue://iplayer, transactionId =
> TX:ID:dominic-tootells-macbook-pro.local-57138-1257203010059-0:0:17751,
> messageCount = 1}, exception: java.io.IOException: No space left on
> device
> java.io.IOException: No space left on device
>        at java.io.RandomAccessFile.setLength(Native Method)
> {code}
> 5) All is good if you spot this and go clear up some space quick
> sharp; both the broker and the producer recover and can carry one.
> However, if you don't notice and react quick enough, and the producer
> keeps on sending messages to the broker, then broker ends up with the
> error "too many open files":
> {code}
> Id = ID:dominic-tootells-macbook-pro.local-57143-1257203033952-0:0:1:1:35920,
> lastMessageId =
> ID:dominic-tootells-macbook-pro.local-57143-1257203033952-0:0:1:1:35920,
> destination = queue://iplayer, transactionId =
> TX:ID:dominic-tootells-macbook-pro.local-57138-1257203010059-0:0:52674,
> messageCount = 1}, exception: java.io.FileNotFoundException:
> /Volumes/SSD/data/journal/data-4 (Too many open files)
> java.io.FileNotFoundException: /Volumes/SSD/data/journal/data-4 (Too
> many open files)
>        at java.io.RandomAccessFile.open(Native Method)
>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>        at org.apache.activemq.kaha.impl.async.DataFile.openRandomAccess
> {code}
> Trying the following combinations:
> - no failover protocol
> - no send sendFailIfNoSpace being sent to the producer
> - recreating the producer connection after the error
> - no consumer attached to the broker
> In the end I attached JProfiler to the broker (via a small junit), and
> noticed that upon the "No space left on device" error the number of
> File objects and FileDescriptor objects would grow, and not shrink.
> Upon looking at the below stack trace:
> {code}
> Caused by: java.io.IOException: No space left on device
>        at java.io.RandomAccessFile.setLength(Native Method)
>        at org.apache.activemq.kaha.impl.async.DataFile.openRandomAccessFile(DataFile.java:96)
>        at org.apache.activemq.kaha.impl.async.AsyncDataManager.allocateLocation(AsyncDataManager.java:276)
>        at org.apache.activemq.kaha.impl.async.DataFileAppender.storeItem(DataFileAppender.java:169)
>        at org.apache.activemq.kaha.impl.async.AsyncDataManager.write(AsyncDataManager.java:647)
>        at org.apache.activemq.store.amq.AMQPersistenceAdapter.writeCommand(AMQPersistenceAdapter.java:697)
>        at org.apache.activemq.store.amq.AMQPersistenceAdapter.writeCommand(AMQPersistenceAdapter.java:693)
>        at org.apache.activemq.store.amq.AMQMessageStore.addMessage(AMQMessageStore.java:106)
>        at org.apache.activemq.broker.region.Queue.doMessageSend(Queue.java:503)
>        at org.apache.activemq.broker.region.Queue.send(Queue.java:480)
>        at org.apache.activemq.broker.region.AbstractRegion.send(AbstractRegion.java:354)
>        at org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:443)
>        at org.apache.activemq.broker.TransactionBroker.send(TransactionBroker.java:224)
>        at org.apache.activemq.broker.CompositeDestinationBroker.send(CompositeDestinationBroker.java:95)
>        at org.apache.activemq.broker.MutableBrokerFilter.send(MutableBrokerFilter.java:133)
>        at org.apache.activemq.broker.TransportConnection.processMessage(TransportConnection.java:455)
>        at org.apache.activemq.command.ActiveMQMessage.visit(ActiveMQMessage.java:639)
>        at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:308)
>        at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:182)
>        at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:68)
>        at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
>        at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityMonitor.java:210)
>        at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:84)
>        at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203)
>        at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:185)
>        at java.lang.Thread.run(Thread.java:637)
> {code}
> I took a look at:
> org.apache.activemq.kaha.impl.async.DataFile.openRandomAccessFile(DataFile.java:96):
> {code}
>   public synchronized RandomAccessFile openRandomAccessFile(boolean
> appender) throws IOException {
>        RandomAccessFile rc = new RandomAccessFile(file, "rw");
>        // When we start to write files size them up so that the OS has a chance
>        // to allocate the file contigously.
>        if (appender) {
>            if (length < preferedSize) {
>                        rc.setLength(preferedSize);
>            }
>        }
>        return rc;
>    }
> {code}
>   The problem is the rc.setLength(preferedSize);  without a try/catch
> block to close the opened file incase of a IOException, that can
> result from the setLength on empty filesystem.
>   Changing the method to, contain a try/catch as follows, from my testing appears to fix the
> issue (have tried on my local broker, and this works).
> {code}
>   public synchronized RandomAccessFile openRandomAccessFile(boolean
> appender) throws IOException {
>        RandomAccessFile rc = new RandomAccessFile(file, "rw");
>        // When we start to write files size them up so that the OS has a chance
>        // to allocate the file contigously.
>        if (appender) {
>            if (length < preferedSize) {
>                try
>                {
>                        rc.setLength(preferedSize);
>                }
>                catch(IOException e)
>                {
>                        try
>                        {
>                                rc.close();
>                        }
>                        catch(Exception closeException){}
>                        throw e;
>                }
>            }
>        }
>        return rc;
>    }
> {code}
> I shall attach a junit for testing (it is hard coded to write to my small removal disk /Volumes/SSD/data), so this you will need to change.  I need somewhere where I could fill the disk up.  The Junit just does:
> - Producer writes to a persistent queue until the disk space fills up and keeps on going.  After a while you see the "too many open files" exception.
> I've looked at trunk
> https://svn.apache.org/repos/asf/activemq/trunk/activemq-core/src/main/java/org/apache/activemq/kaha/impl/async/DataFile.java
> And this has the same code as the 5.3.0.4 so I'm guessing that would have the same issue.
> I'll attach the junit, the patch diff and the patch file.
> /dom

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.