You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Timo Nentwig <ti...@toptarif.de> on 2010/12/22 10:36:00 UTC

java.io.IOException: No space left on device

So, this is my ring, the third node ran out of disk space:

Address         Status State   Load            Owns    Token                                       
                                                       139315361777093290765734121398073449298     
192.168.68.76   Up     Normal  37.83 GB        39.03%  35588525726498188223628193834394222094      
192.168.68.80   Up     Normal  56.13 GB        24.65%  77536499411085694672769767983714006803      
192.168.68.69   Down   Normal  70.62 GB        24.66%  119490787523049504815252019414532364540     
192.168.68.233  Up     Normal  52.92 GB        5.83%   129401794004285600053753090610420021554     
192.168.68.70   Up     Normal  32.8 GB         5.83%   139315361777093290765734121398073449298 

And the data could be more evenly balanced, obviously. However the nodes fails to startup because due of lacking disk space (instead of starting up and denies further writes it appears to try to process the [6.6G!] commit logs). So, I cannot perform any actions on it no more like re-balancing the ring or reading old data from it and rotating it somewhere else. So, what to do now?

BTW what precisely does the Owns column mean?

Re: java.io.IOException: No space left on device

Posted by Tyler Hobbs <ty...@riptano.com>.
>>> BTW what precisely does the Owns column mean?
>>
>> The percentage of the token space owned by the node.
>
> Precisely meaning what? :) On my ring of 5 machines, 3 own about 1/3 and 2
(own only 5% - and one of these contains 1/3 more data than the two largest
in the cluster, it's actually the on that ran out of disk space).

That should be: the percentage of the token space *that the node is the
primary replica for*.  If you have RF > 1, a node's load also includes data
from ranges that come before it on the ring.

- Tyler

On Wed, Dec 22, 2010 at 9:46 AM, Timo Nentwig <ti...@toptarif.de>wrote:

>
> On Dec 22, 2010, at 16:20, Peter Schuller wrote:
>
> >> And the data could be more evenly balanced, obviously. However the nodes
> fails to startup because due of lacking disk space (instead of starting up
> and denies further writes it appears to try to process the [6.6G!] commit
> logs). So, I cannot perform any actions on it no more like re-balancing the
> ring or reading old data from it and rotating it somewhere else. So, what to
> do now?
> >
> > So even given deletion of obsolete sstables on start-up, it goes out
> > of disk just from the commit log replay of only 6 gig? Sounds like
> > you're very, very full.
>
> Answer:
>
> $ time cassandra -f
>  INFO 16:30:09,486 Heap size: 2143158272/2143158272
> log4j:ERROR Failed to flush writer,
> java.io.IOException: No space left on device
>        at java.io.FileOutputStream.writeBytes(Native Method)
>        at java.io.FileOutputStream.write(FileOutputStream.java:260)
>        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
>        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
>        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
>        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
>        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
>        at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
>        at
> org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
>        at
> org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
>        at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
>        at
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>        at
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>        at org.apache.log4j.Category.callAppenders(Category.java:206)
>        at org.apache.log4j.Category.forcedLog(Category.java:391)
>        at org.apache.log4j.Category.log(Category.java:856)
>        at org.slf4j.impl.Log4jLoggerAdapter.info
> (Log4jLoggerAdapter.java:347)
>        at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:73)
>        at
> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
>        at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216)
>        at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
>  INFO 16:30:09,495 JNA not found. Native methods will be disabled.
>  INFO 16:30:09,504 Loading settings from
> file:/home/dev/cassandra.git/conf/cassandra.yaml
>  INFO 16:30:09,774 DiskAccessMode 'auto' determined to be mmap,
> indexAccessMode is mmap
>  INFO 16:30:09,849 Creating new commitlog segment
> /var/lib/cassandra/commitlog/CommitLog-1293031809849.log
> ERROR 16:30:09,853 Exception encountered during startup.
> java.io.IOError: java.io.IOException: No space left on device
>        at
> org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:59)
>        at
> org.apache.cassandra.db.commitlog.CommitLog.<init>(CommitLog.java:113)
>        at
> org.apache.cassandra.db.commitlog.CommitLog.<clinit>(CommitLog.java:83)
>        at
> org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:347)
>        at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:76)
>        at
> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
>        at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216)
>        at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
> Caused by: java.io.IOException: No space left on device
>        at java.io.FileOutputStream.write(Native Method)
>        at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
>        at
> org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157)
>        at
> org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:121)
>        at
> org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70)
>        at
> org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:55)
>        ... 7 more
> Exception encountered during startup.
> java.io.IOError: java.io.IOException: No space left on device
>        at
> org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:59)
>        at
> org.apache.cassandra.db.commitlog.CommitLog.<init>(CommitLog.java:113)
>        at
> org.apache.cassandra.db.commitlog.CommitLog.<clinit>(CommitLog.java:83)
>        at
> org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:347)
>        at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:76)
>        at
> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
>        at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216)
>        at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
> Caused by: java.io.IOException: No space left on device
>        at java.io.FileOutputStream.write(Native Method)
>        at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
>        at
> org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157)
>        at
> org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:121)
>        at
> org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70)
>        at
> org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:55)
>        ... 7 more
>
> real    0m1.210s
> user    0m0.600s
> sys     0m0.100s
>
> So it's instantly dead. I doesn't even attempt to delete any potentially
> old data. sstable2json dies instantly as well!
>
> Actually 5% are still available but not accessible by non-root users
> (ext3). So the cassandra user literally cannot write a single byte to that
> volume no more.
>
> > Some potential options may be:
> >
> > * Replace the node completely with a new one with sufficient disk /
> > another token location (but carefully, by adding the new node first).
>
> That may not always be an option :)
>
> > It strikes me that dealing with out-of-disk conditions is probably a
> > good topic of an in-depth wiki page for what to do in various cases,
> > depending on usage. The above three suggested options may or may not
>
> Yes! :)
>
> >> BTW what precisely does the Owns column mean?
> >
> > The percentage of the token space owned by the node.
>
> Precisely meaning what? :) On my ring of 5 machines, 3 own about 1/3 and 2
> (own only 5% - and one of these contains 1/3 more data than the two largest
> in the cluster, it's actually the on that ran out of disk space).

Re: java.io.IOException: No space left on device

Posted by Timo Nentwig <ti...@toptarif.de>.
On Dec 22, 2010, at 16:20, Peter Schuller wrote:

>> And the data could be more evenly balanced, obviously. However the nodes fails to startup because due of lacking disk space (instead of starting up and denies further writes it appears to try to process the [6.6G!] commit logs). So, I cannot perform any actions on it no more like re-balancing the ring or reading old data from it and rotating it somewhere else. So, what to do now?
> 
> So even given deletion of obsolete sstables on start-up, it goes out
> of disk just from the commit log replay of only 6 gig? Sounds like
> you're very, very full.

Answer:

$ time cassandra -f
 INFO 16:30:09,486 Heap size: 2143158272/2143158272
log4j:ERROR Failed to flush writer,
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
        at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
        at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
        at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
        at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
        at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
        at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
        at org.apache.log4j.Category.callAppenders(Category.java:206)
        at org.apache.log4j.Category.forcedLog(Category.java:391)
        at org.apache.log4j.Category.log(Category.java:856)
        at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:347)
        at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:73)
        at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
        at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216)
        at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
 INFO 16:30:09,495 JNA not found. Native methods will be disabled.
 INFO 16:30:09,504 Loading settings from file:/home/dev/cassandra.git/conf/cassandra.yaml
 INFO 16:30:09,774 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO 16:30:09,849 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1293031809849.log
ERROR 16:30:09,853 Exception encountered during startup.
java.io.IOError: java.io.IOException: No space left on device
        at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:59)
        at org.apache.cassandra.db.commitlog.CommitLog.<init>(CommitLog.java:113)
        at org.apache.cassandra.db.commitlog.CommitLog.<clinit>(CommitLog.java:83)
        at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:347)
        at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:76)
        at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
        at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216)
        at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.write(Native Method)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
        at org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157)
        at org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:121)
        at org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70)
        at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:55)
        ... 7 more
Exception encountered during startup.
java.io.IOError: java.io.IOException: No space left on device
        at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:59)
        at org.apache.cassandra.db.commitlog.CommitLog.<init>(CommitLog.java:113)
        at org.apache.cassandra.db.commitlog.CommitLog.<clinit>(CommitLog.java:83)
        at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:347)
        at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:76)
        at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
        at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216)
        at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.write(Native Method)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
        at org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157)
        at org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:121)
        at org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70)
        at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:55)
        ... 7 more

real    0m1.210s
user    0m0.600s
sys     0m0.100s

So it's instantly dead. I doesn't even attempt to delete any potentially old data. sstable2json dies instantly as well!

Actually 5% are still available but not accessible by non-root users (ext3). So the cassandra user literally cannot write a single byte to that volume no more.

> Some potential options may be:
> 
> * Replace the node completely with a new one with sufficient disk /
> another token location (but carefully, by adding the new node first).

That may not always be an option :)

> It strikes me that dealing with out-of-disk conditions is probably a
> good topic of an in-depth wiki page for what to do in various cases,
> depending on usage. The above three suggested options may or may not

Yes! :)

>> BTW what precisely does the Owns column mean?
> 
> The percentage of the token space owned by the node.

Precisely meaning what? :) On my ring of 5 machines, 3 own about 1/3 and 2 (own only 5% - and one of these contains 1/3 more data than the two largest in the cluster, it's actually the on that ran out of disk space).

Re: java.io.IOException: No space left on device

Posted by Peter Schuller <pe...@infidyne.com>.
>> In any case: Monitoring disk-space is very very important.
>
> So, why doesn't cassandra monitor it itself and stop accepting writes if it runs out of space?

For one thing, it's non-trivial to do accurately because disk space
usage varies over time due to background compaction and/or
anti-compaction. Compaction will require different amount of
diskspaces depending on the nature of writes (e.g., if all writes are
removals compaction takes much less space than if all writes are
non-overwriting inserts).

Maybe the maximum possible disk space use can be kept track of in a
conservative fashion (assuming all writes are non-overwriting inserts,
assuming a maximally disk-space intensive repair will be run, etc) and
then behave appropriately based on that. If someone has a use case
where the cluster is sufficiently close to running out of disk space
(but it works for the particular use-case), such a feature could be
turned off. But it would make it less easy to accidentally write
yourself into a corner.

-- 
/ Peter Schuller

Re: java.io.IOException: No space left on device

Posted by Timo Nentwig <ti...@toptarif.de>.
On Dec 22, 2010, at 16:20, Peter Schuller wrote:

> In any case: Monitoring disk-space is very very important.

So, why doesn't cassandra monitor it itself and stop accepting writes if it runs out of space?

Re: java.io.IOException: No space left on device

Posted by Peter Schuller <pe...@infidyne.com>.
> And the data could be more evenly balanced, obviously. However the nodes fails to startup because due of lacking disk space (instead of starting up and denies further writes it appears to try to process the [6.6G!] commit logs). So, I cannot perform any actions on it no more like re-balancing the ring or reading old data from it and rotating it somewhere else. So, what to do now?

So even given deletion of obsolete sstables on start-up, it goes out
of disk just from the commit log replay of only 6 gig? Sounds like
you're very, very full.

Some potential options may be:

* Replace the node completely with a new one with sufficient disk /
another token location (but carefully, by adding the new node first).
* Restarting the nodes with sstables gone and perform a repair to
re-populate it with a minimum amount of overhead (by not writing new
data to it).
* Removing some of the sstables with the aim of allowing commit log
replay to complete, followed by repair.

But those two assume RF > 1 and that you are willing to lose writes
(potentially) according to the consistency level you have used for
writes.

It strikes me that dealing with out-of-disk conditions is probably a
good topic of an in-depth wiki page for what to do in various cases,
depending on usage. The above three suggested options may or may not
be a good idea depending on circumstances. In particular I don't
remember if a repair or bootstrap still, in 0.7, consumes disk-space
on the nodes *from* which data is being sent (anyone?).

In any case: Monitoring disk-space is very very important.

> BTW what precisely does the Owns column mean?

The percentage of the token space owned by the node.

-- 
/ Peter Schuller