You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Shotaro Kamio <ka...@gmail.com> on 2011/04/20 11:25:32 UTC

Compacting single file forever

Hi,

I found that our cluster repeats compacting a single file forever
(cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
like to have comments from you guys.

Situation:
- After trying to repair a column family, our cluster's disk usage is
quite high. Cassandra cannot compact all sstables at once. I think it
repeats compacting single file at the end. (you can check the attached
log below)
- Our data doesn't have deletes. So, the compaction of single file
doesn't make free disk space.

We are approaching to full-disk. But I believe that the repair
operation made a lot of duplicate data on the disk and it requires
compaction. However, most of nodes stuck on compacting a single file.
The only thing we can do is to restart the nodes.

My question is why the compaction doesn't stop.

I looked at the logic in CompactionManager.java:
-----------------
        String compactionFileLocation =
table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
        // If the compaction file path is null that means we have no
space left for this compaction.
        // try again w/o the largest one.
        List<SSTableReader> smallerSSTables = new
ArrayList<SSTableReader>(sstables);
        while (compactionFileLocation == null && smallerSSTables.size() > 1)
        {
            logger.warn("insufficient space to compact all requested
files " + StringUtils.join(smallerSSTables, ", "));
            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
            compactionFileLocation =
table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
        }
        if (compactionFileLocation == null)
        {
            logger.error("insufficient space to compact even the two
smallest files, aborting");
            return 0;
        }
-----------------

The while condition: smallerSSTables.size() > 1
Is this should be "smallerSSTables.size() > 2" ?

In my understanding, compaction of single file makes free disk space
only when the sstable has a lot of tombstone and only if the tombstone
is removed in the compaction. If cassandra knows the sstable has
tombstones to be removed, it's worth to compact it. Otherwise, it
might makes free space if you are lucky. In worst case, it leads to
infinite loop like our case.

What do you think the code change?


Best regards,
Shotaro


* Cassandra compaction log
-------------------------
 WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(
path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
 INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
CompactionManager.java (line 482) Compacted to
foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.

 WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-3020-Data.db'),
SSTableReader(path='foobar-f-3035-Data.db')
 INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
CompactionManager.java (line 482) Compacted to
foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.

 WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-3020-Data.db'),
SSTableReader(path='foobar-f-3036-Data.db')
 INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
CompactionManager.java (line 482) Compacted to
foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
-------------------------
You can see that compacted size is always the same. It repeats
compacting the same single sstable.

Re: Compacting single file forever

Posted by Jonathan Ellis <jb...@gmail.com>.
https://issues.apache.org/jira/browse/CASSANDRA-2575

On Thu, Apr 21, 2011 at 11:56 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> I suggest as a workaround making the forceUserDefinedCompaction method
> ignore disk space estimates and attempt the requested compaction even
> if it guesses it will not have enough space. This would allow you to
> submit the 2 sstables you want manually.
>
> On Thu, Apr 21, 2011 at 8:34 AM, Shotaro Kamio <ka...@gmail.com> wrote:
>> Hi Aaron,
>>
>>
>> Maybe, my previous description was not good. It's not a compaction
>> threshold problem.
>> In fact, Cassandra tries to compact 7 sstables in the minor
>> compaction. But it decreases the number of sstables one by one due to
>> insufficient disk space. At the end, it compacts a single file as in
>> the new log below.
>>
>> Compactionstats on a node says:
>>
>>  compaction type: Minor
>>  column family: foobar
>>  bytes compacted: 133473101929
>>  bytes total in progress: 170000743825
>>  pending tasks: 12
>>
>> The disk usage reaches 78%. It's really tough situation. But I guess
>> the data contains a lot of duplicates. because we feed same data again
>> and again and do repair.
>>
>>
>> Another thing I'm wondering is a file selection algorithm.
>> For example, one of disks has 235G free space. It contains sstables of
>> 61G, 159G, 191G, 196G, 197G. The one cassandra trying to compact
>> forever is 159G sstable. But there is smaller sstable. It should try
>> compacting 61G + 159G ideally.
>> A more intelligent algorithm is required to find optimal combination.
>> And if cassandra knows statistics about number of deleted data and old
>> data to be compacted for sstables, it should be useful to find more
>> efficient file combination.
>>
>>
>> Regards,
>> Shotaro
>>
>>
>>
>> * Minor compaction log
>> -----
>>  WARN [CompactionExecutor:1] 2011-04-21 21:44:08,554
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-773-Data.db'),
>> SSTableReader(path='foobar-f-1452-Data.db'),
>> SSTableReader(path='foobar-f-1620-Data.db'),
>> SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1690-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:44:28,565
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-773-Data.db'),
>> SSTableReader(path='foobar-f-1452-Data.db'),
>> SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1690-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:44:48,576
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-773-Data.db'),
>> SSTableReader(path='foobar-f-1452-Data.db'),
>> SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:45:08,586
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-1452-Data.db'),
>> SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:45:28,596
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1643-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>>  WARN [CompactionExecutor:1] 2011-04-21 21:45:48,607
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-1642-Data.db'),
>> SSTableReader(path='foobar-f-1814-Data.db')
>> ------
>>
>>
>>
>> On Thu, Apr 21, 2011 at 7:20 PM, aaron morton <aa...@thelastpickle.com> wrote:
>>> Want to check if you are talking about minor compactions or major (nodetool)
>>> compactions.
>>> What settings compaction settings do you have for this CF ? You can increase
>>> the min compaction threshold and reduce the frequency of
>>> compactions http://wiki.apache.org/cassandra/StorageConfiguration
>>> It seems like compaction is running continually, are their pending tasks in
>>> the o.a.c.db.CompactionManager MBean ?
>>> How bad is you disk space problem ?
>>> For the code change, AFAIK it's not possible for cassandra to know if there
>>> are tombstones in the SSTable which can be purged until the rows are read.
>>> Perhaps the file could hold the earliest deleted at time somewhere (same for
>>> TTL), but I do not think we do that now.
>>> Hope that helps.
>>> Aaron
>>>
>>> On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:
>>>
>>> Hi,
>>>
>>> I found that our cluster repeats compacting a single file forever
>>> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
>>> like to have comments from you guys.
>>>
>>> Situation:
>>> - After trying to repair a column family, our cluster's disk usage is
>>> quite high. Cassandra cannot compact all sstables at once. I think it
>>> repeats compacting single file at the end. (you can check the attached
>>> log below)
>>> - Our data doesn't have deletes. So, the compaction of single file
>>> doesn't make free disk space.
>>>
>>> We are approaching to full-disk. But I believe that the repair
>>> operation made a lot of duplicate data on the disk and it requires
>>> compaction. However, most of nodes stuck on compacting a single file.
>>> The only thing we can do is to restart the nodes.
>>>
>>> My question is why the compaction doesn't stop.
>>>
>>> I looked at the logic in CompactionManager.java:
>>> -----------------
>>>        String compactionFileLocation =
>>> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
>>>        // If the compaction file path is null that means we have no
>>> space left for this compaction.
>>>        // try again w/o the largest one.
>>>        List<SSTableReader> smallerSSTables = new
>>> ArrayList<SSTableReader>(sstables);
>>>        while (compactionFileLocation == null && smallerSSTables.size() > 1)
>>>        {
>>>            logger.warn("insufficient space to compact all requested
>>> files " + StringUtils.join(smallerSSTables, ", "));
>>>            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
>>>            compactionFileLocation =
>>> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
>>>        }
>>>        if (compactionFileLocation == null)
>>>        {
>>>            logger.error("insufficient space to compact even the two
>>> smallest files, aborting");
>>>            return 0;
>>>        }
>>> -----------------
>>>
>>> The while condition: smallerSSTables.size() > 1
>>> Is this should be "smallerSSTables.size() > 2" ?
>>>
>>> In my understanding, compaction of single file makes free disk space
>>> only when the sstable has a lot of tombstone and only if the tombstone
>>> is removed in the compaction. If cassandra knows the sstable has
>>> tombstones to be removed, it's worth to compact it. Otherwise, it
>>> might makes free space if you are lucky. In worst case, it leads to
>>> infinite loop like our case.
>>>
>>> What do you think the code change?
>>>
>>>
>>> Best regards,
>>> Shotaro
>>>
>>>
>>> * Cassandra compaction log
>>> -------------------------
>>> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
>>> CompactionManager.java (line 405) insufficient space to compact all
>>> requested files SSTableReader(
>>> path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
>>> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
>>> CompactionManager.java (line 482) Compacted to
>>> foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>>> of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.
>>>
>>> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
>>> CompactionManager.java (line 405) insufficient space to compact all
>>> requested files SSTableReader(path='foobar-f-3020-Data.db'),
>>> SSTableReader(path='foobar-f-3035-Data.db')
>>> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
>>> CompactionManager.java (line 482) Compacted to
>>> foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>>> of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.
>>>
>>> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
>>> CompactionManager.java (line 405) insufficient space to compact all
>>> requested files SSTableReader(path='foobar-f-3020-Data.db'),
>>> SSTableReader(path='foobar-f-3036-Data.db')
>>> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
>>> CompactionManager.java (line 482) Compacted to
>>> foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>>> of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
>>> -------------------------
>>> You can see that compacted size is always the same. It repeats
>>> compacting the same single sstable.
>>>
>>>
>>
>>
>>
>> --
>> Shotaro Kamio
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Compacting single file forever

Posted by Terje Marthinussen <tm...@gmail.com>.
I think the really interesting part is how this node ended up in this state
in the first place.

There should be somewhere in the area of 340-500GB of data on it in when
everything is 100% compacted.
Problem now is that it used (we wiped it last night to test some 0.8 stuff)
more then 1TB.

To me, it seems like there are some nasty potential worst cases you can get.

Lets say you are in a fine spot. You have 1TB disk.
All data compacted into one sstable and your data uses 300GB.

Now you issue a repair, and disk usage start increasing while at the same
time there are some events that updates a fairly large amount of
non-overlapping data for this node or the 2 node it has replicated data for
so you end up with large sstables of similar size, but if you try to compact
them, you end up essentially with almost a full dataset.

That is, you end up in a situation where the only sstables it tries to merge
is
sstable1 which has keys 1,2,3
sstable2 which has keys 4,5,6

That is, in a worst case, we have:
- We need 300GB for the original compacted sstable
- An unknown amount of data from the repair
- An unknown amount of duplicates in smaller sstables
- We then in worst case need space for the sum of the 2 sstables it tries to
merge (which are in the 170GB region each in this case)

I think something like this has happened here and it has eventually ended up
in a situation where it cannot recover even though the total disk space is
somewhere 4-6 times the size the optimally  compacted data size.

This  is something of a worst case scenario, but it ends up in a situation
that is unrecoverable which is not good.

Only way I can think of avoiding this is to segment the sstables based on
key range so you never get sstables that requires up to 50% of the disk to
compact and you have a higher probability that compacted sstables have same
keys.

Maybe split this in directories named on token ranges or just prefix the
sstable names with the token range is a prefix of the file name so very
little overhead is added to look up data.

Someting like
MyCF_00-08_Data.db
MyCF_08-ff_Data.db

where 00-08 is the token range of the keys in that sstable. These ranges
could be changing as compaction occurs to keep balance and avoid that any
single sstable gets very large

Terje

On Fri, Apr 22, 2011 at 1:56 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> I suggest as a workaround making the forceUserDefinedCompaction method
> ignore disk space estimates and attempt the requested compaction even
> if it guesses it will not have enough space. This would allow you to
> submit the 2 sstables you want manually.
>
> On Thu, Apr 21, 2011 at 8:34 AM, Shotaro Kamio <ka...@gmail.com>
> wrote:
> > Hi Aaron,
> >
> >
> > Maybe, my previous description was not good. It's not a compaction
> > threshold problem.
> > In fact, Cassandra tries to compact 7 sstables in the minor
> > compaction. But it decreases the number of sstables one by one due to
> > insufficient disk space. At the end, it compacts a single file as in
> > the new log below.
> >
> > Compactionstats on a node says:
> >
> >  compaction type: Minor
> >  column family: foobar
> >  bytes compacted: 133473101929
> >  bytes total in progress: 170000743825
> >  pending tasks: 12
> >
> > The disk usage reaches 78%. It's really tough situation. But I guess
> > the data contains a lot of duplicates. because we feed same data again
> > and again and do repair.
> >
> >
> > Another thing I'm wondering is a file selection algorithm.
> > For example, one of disks has 235G free space. It contains sstables of
> > 61G, 159G, 191G, 196G, 197G. The one cassandra trying to compact
> > forever is 159G sstable. But there is smaller sstable. It should try
> > compacting 61G + 159G ideally.
> > A more intelligent algorithm is required to find optimal combination.
> > And if cassandra knows statistics about number of deleted data and old
> > data to be compacted for sstables, it should be useful to find more
> > efficient file combination.
> >
> >
> > Regards,
> > Shotaro
> >
> >
> >
> > * Minor compaction log
> > -----
> >  WARN [CompactionExecutor:1] 2011-04-21 21:44:08,554
> > CompactionManager.java (line 405) insufficient space to compact all
> > requested files SSTableReader(path='foobar-f-773-Data.db'),
> > SSTableReader(path='foobar-f-1452-Data.db'),
> > SSTableReader(path='foobar-f-1620-Data.db'),
> > SSTableReader(path='foobar-f-1642-Data.db'),
> > SSTableReader(path='foobar-f-1643-Data.db'),
> > SSTableReader(path='foobar-f-1690-Data.db'),
> > SSTableReader(path='foobar-f-1814-Data.db')
> >  WARN [CompactionExecutor:1] 2011-04-21 21:44:28,565
> > CompactionManager.java (line 405) insufficient space to compact all
> > requested files SSTableReader(path='foobar-f-773-Data.db'),
> > SSTableReader(path='foobar-f-1452-Data.db'),
> > SSTableReader(path='foobar-f-1642-Data.db'),
> > SSTableReader(path='foobar-f-1643-Data.db'),
> > SSTableReader(path='foobar-f-1690-Data.db'),
> > SSTableReader(path='foobar-f-1814-Data.db')
> >  WARN [CompactionExecutor:1] 2011-04-21 21:44:48,576
> > CompactionManager.java (line 405) insufficient space to compact all
> > requested files SSTableReader(path='foobar-f-773-Data.db'),
> > SSTableReader(path='foobar-f-1452-Data.db'),
> > SSTableReader(path='foobar-f-1642-Data.db'),
> > SSTableReader(path='foobar-f-1643-Data.db'),
> > SSTableReader(path='foobar-f-1814-Data.db')
> >  WARN [CompactionExecutor:1] 2011-04-21 21:45:08,586
> > CompactionManager.java (line 405) insufficient space to compact all
> > requested files SSTableReader(path='foobar-f-1452-Data.db'),
> > SSTableReader(path='foobar-f-1642-Data.db'),
> > SSTableReader(path='foobar-f-1643-Data.db'),
> > SSTableReader(path='foobar-f-1814-Data.db')
> >  WARN [CompactionExecutor:1] 2011-04-21 21:45:28,596
> > CompactionManager.java (line 405) insufficient space to compact all
> > requested files SSTableReader(path='foobar-f-1642-Data.db'),
> > SSTableReader(path='foobar-f-1643-Data.db'),
> > SSTableReader(path='foobar-f-1814-Data.db')
> >  WARN [CompactionExecutor:1] 2011-04-21 21:45:48,607
> > CompactionManager.java (line 405) insufficient space to compact all
> > requested files SSTableReader(path='foobar-f-1642-Data.db'),
> > SSTableReader(path='foobar-f-1814-Data.db')
> > ------
> >
> >
> >
> > On Thu, Apr 21, 2011 at 7:20 PM, aaron morton <aa...@thelastpickle.com>
> wrote:
> >> Want to check if you are talking about minor compactions or major
> (nodetool)
> >> compactions.
> >> What settings compaction settings do you have for this CF ? You can
> increase
> >> the min compaction threshold and reduce the frequency of
> >> compactions http://wiki.apache.org/cassandra/StorageConfiguration
> >> It seems like compaction is running continually, are their pending tasks
> in
> >> the o.a.c.db.CompactionManager MBean ?
> >> How bad is you disk space problem ?
> >> For the code change, AFAIK it's not possible for cassandra to know if
> there
> >> are tombstones in the SSTable which can be purged until the rows are
> read.
> >> Perhaps the file could hold the earliest deleted at time somewhere (same
> for
> >> TTL), but I do not think we do that now.
> >> Hope that helps.
> >> Aaron
> >>
> >> On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:
> >>
> >> Hi,
> >>
> >> I found that our cluster repeats compacting a single file forever
> >> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
> >> like to have comments from you guys.
> >>
> >> Situation:
> >> - After trying to repair a column family, our cluster's disk usage is
> >> quite high. Cassandra cannot compact all sstables at once. I think it
> >> repeats compacting single file at the end. (you can check the attached
> >> log below)
> >> - Our data doesn't have deletes. So, the compaction of single file
> >> doesn't make free disk space.
> >>
> >> We are approaching to full-disk. But I believe that the repair
> >> operation made a lot of duplicate data on the disk and it requires
> >> compaction. However, most of nodes stuck on compacting a single file.
> >> The only thing we can do is to restart the nodes.
> >>
> >> My question is why the compaction doesn't stop.
> >>
> >> I looked at the logic in CompactionManager.java:
> >> -----------------
> >>        String compactionFileLocation =
> >> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
> >>        // If the compaction file path is null that means we have no
> >> space left for this compaction.
> >>        // try again w/o the largest one.
> >>        List<SSTableReader> smallerSSTables = new
> >> ArrayList<SSTableReader>(sstables);
> >>        while (compactionFileLocation == null && smallerSSTables.size() >
> 1)
> >>        {
> >>            logger.warn("insufficient space to compact all requested
> >> files " + StringUtils.join(smallerSSTables, ", "));
> >>            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
> >>            compactionFileLocation =
> >>
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
> >>        }
> >>        if (compactionFileLocation == null)
> >>        {
> >>            logger.error("insufficient space to compact even the two
> >> smallest files, aborting");
> >>            return 0;
> >>        }
> >> -----------------
> >>
> >> The while condition: smallerSSTables.size() > 1
> >> Is this should be "smallerSSTables.size() > 2" ?
> >>
> >> In my understanding, compaction of single file makes free disk space
> >> only when the sstable has a lot of tombstone and only if the tombstone
> >> is removed in the compaction. If cassandra knows the sstable has
> >> tombstones to be removed, it's worth to compact it. Otherwise, it
> >> might makes free space if you are lucky. In worst case, it leads to
> >> infinite loop like our case.
> >>
> >> What do you think the code change?
> >>
> >>
> >> Best regards,
> >> Shotaro
> >>
> >>
> >> * Cassandra compaction log
> >> -------------------------
> >> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
> >> CompactionManager.java (line 405) insufficient space to compact all
> >> requested files SSTableReader(
> >> path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3034-Data.db')
> >> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
> >> CompactionManager.java (line 482) Compacted to
> >> foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> >> of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.
> >>
> >> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
> >> CompactionManager.java (line 405) insufficient space to compact all
> >> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> >> SSTableReader(path='foobar-f-3035-Data.db')
> >> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
> >> CompactionManager.java (line 482) Compacted to
> >> foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> >> of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.
> >>
> >> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
> >> CompactionManager.java (line 405) insufficient space to compact all
> >> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> >> SSTableReader(path='foobar-f-3036-Data.db')
> >> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
> >> CompactionManager.java (line 482) Compacted to
> >> foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> >> of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
> >> -------------------------
> >> You can see that compacted size is always the same. It repeats
> >> compacting the same single sstable.
> >>
> >>
> >
> >
> >
> > --
> > Shotaro Kamio
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Compacting single file forever

Posted by Jonathan Ellis <jb...@gmail.com>.
I suggest as a workaround making the forceUserDefinedCompaction method
ignore disk space estimates and attempt the requested compaction even
if it guesses it will not have enough space. This would allow you to
submit the 2 sstables you want manually.

On Thu, Apr 21, 2011 at 8:34 AM, Shotaro Kamio <ka...@gmail.com> wrote:
> Hi Aaron,
>
>
> Maybe, my previous description was not good. It's not a compaction
> threshold problem.
> In fact, Cassandra tries to compact 7 sstables in the minor
> compaction. But it decreases the number of sstables one by one due to
> insufficient disk space. At the end, it compacts a single file as in
> the new log below.
>
> Compactionstats on a node says:
>
>  compaction type: Minor
>  column family: foobar
>  bytes compacted: 133473101929
>  bytes total in progress: 170000743825
>  pending tasks: 12
>
> The disk usage reaches 78%. It's really tough situation. But I guess
> the data contains a lot of duplicates. because we feed same data again
> and again and do repair.
>
>
> Another thing I'm wondering is a file selection algorithm.
> For example, one of disks has 235G free space. It contains sstables of
> 61G, 159G, 191G, 196G, 197G. The one cassandra trying to compact
> forever is 159G sstable. But there is smaller sstable. It should try
> compacting 61G + 159G ideally.
> A more intelligent algorithm is required to find optimal combination.
> And if cassandra knows statistics about number of deleted data and old
> data to be compacted for sstables, it should be useful to find more
> efficient file combination.
>
>
> Regards,
> Shotaro
>
>
>
> * Minor compaction log
> -----
>  WARN [CompactionExecutor:1] 2011-04-21 21:44:08,554
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-773-Data.db'),
> SSTableReader(path='foobar-f-1452-Data.db'),
> SSTableReader(path='foobar-f-1620-Data.db'),
> SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1690-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
>  WARN [CompactionExecutor:1] 2011-04-21 21:44:28,565
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-773-Data.db'),
> SSTableReader(path='foobar-f-1452-Data.db'),
> SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1690-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
>  WARN [CompactionExecutor:1] 2011-04-21 21:44:48,576
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-773-Data.db'),
> SSTableReader(path='foobar-f-1452-Data.db'),
> SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
>  WARN [CompactionExecutor:1] 2011-04-21 21:45:08,586
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-1452-Data.db'),
> SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
>  WARN [CompactionExecutor:1] 2011-04-21 21:45:28,596
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
>  WARN [CompactionExecutor:1] 2011-04-21 21:45:48,607
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
> ------
>
>
>
> On Thu, Apr 21, 2011 at 7:20 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> Want to check if you are talking about minor compactions or major (nodetool)
>> compactions.
>> What settings compaction settings do you have for this CF ? You can increase
>> the min compaction threshold and reduce the frequency of
>> compactions http://wiki.apache.org/cassandra/StorageConfiguration
>> It seems like compaction is running continually, are their pending tasks in
>> the o.a.c.db.CompactionManager MBean ?
>> How bad is you disk space problem ?
>> For the code change, AFAIK it's not possible for cassandra to know if there
>> are tombstones in the SSTable which can be purged until the rows are read.
>> Perhaps the file could hold the earliest deleted at time somewhere (same for
>> TTL), but I do not think we do that now.
>> Hope that helps.
>> Aaron
>>
>> On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:
>>
>> Hi,
>>
>> I found that our cluster repeats compacting a single file forever
>> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
>> like to have comments from you guys.
>>
>> Situation:
>> - After trying to repair a column family, our cluster's disk usage is
>> quite high. Cassandra cannot compact all sstables at once. I think it
>> repeats compacting single file at the end. (you can check the attached
>> log below)
>> - Our data doesn't have deletes. So, the compaction of single file
>> doesn't make free disk space.
>>
>> We are approaching to full-disk. But I believe that the repair
>> operation made a lot of duplicate data on the disk and it requires
>> compaction. However, most of nodes stuck on compacting a single file.
>> The only thing we can do is to restart the nodes.
>>
>> My question is why the compaction doesn't stop.
>>
>> I looked at the logic in CompactionManager.java:
>> -----------------
>>        String compactionFileLocation =
>> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
>>        // If the compaction file path is null that means we have no
>> space left for this compaction.
>>        // try again w/o the largest one.
>>        List<SSTableReader> smallerSSTables = new
>> ArrayList<SSTableReader>(sstables);
>>        while (compactionFileLocation == null && smallerSSTables.size() > 1)
>>        {
>>            logger.warn("insufficient space to compact all requested
>> files " + StringUtils.join(smallerSSTables, ", "));
>>            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
>>            compactionFileLocation =
>> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
>>        }
>>        if (compactionFileLocation == null)
>>        {
>>            logger.error("insufficient space to compact even the two
>> smallest files, aborting");
>>            return 0;
>>        }
>> -----------------
>>
>> The while condition: smallerSSTables.size() > 1
>> Is this should be "smallerSSTables.size() > 2" ?
>>
>> In my understanding, compaction of single file makes free disk space
>> only when the sstable has a lot of tombstone and only if the tombstone
>> is removed in the compaction. If cassandra knows the sstable has
>> tombstones to be removed, it's worth to compact it. Otherwise, it
>> might makes free space if you are lucky. In worst case, it leads to
>> infinite loop like our case.
>>
>> What do you think the code change?
>>
>>
>> Best regards,
>> Shotaro
>>
>>
>> * Cassandra compaction log
>> -------------------------
>> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(
>> path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
>> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
>> CompactionManager.java (line 482) Compacted to
>> foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>> of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.
>>
>> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-3020-Data.db'),
>> SSTableReader(path='foobar-f-3035-Data.db')
>> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
>> CompactionManager.java (line 482) Compacted to
>> foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>> of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.
>>
>> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-3020-Data.db'),
>> SSTableReader(path='foobar-f-3036-Data.db')
>> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
>> CompactionManager.java (line 482) Compacted to
>> foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>> of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
>> -------------------------
>> You can see that compacted size is always the same. It repeats
>> compacting the same single sstable.
>>
>>
>
>
>
> --
> Shotaro Kamio
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Compacting single file forever

Posted by aaron morton <aa...@thelastpickle.com>.
Running at 78% disk capacity is somewhat out there on the edge.

The CompactionManager is showing that compactions are backing up. I'm guessing this has to do with the minor compactions not been able compact the list of files they want to, so it cannot reduce the number of files each compaction bucket to to below min_compaction_threshold and is continually been triggered. 

The worse case scenario for compaction is that it will require exactly the disk space for the new file as it does for the existing files. I'm not sure it's possible to get better estimates without processing the contents of the files. For example consider a row spread out over 3 sstables: in the first file it is 100MB, in the second it is 1MB, and in the third it's 50MB. It's size in the new file could be anywhere from 0MB to 151MB depending on tombstones and which columns are in each SSTable, their timestamps, TTL and even their value (in the event of a name+ timestamp column collision). We need to take a conservative approach and say it's going to be 151MB until proven otherwise. 

Currently compaction will be trying to find the biggest file that can fit into 90% of the data directory with the most free usable space. In your case with 235G of free disk, thats 211G so it first glance it looks like it should only be compacting the 61G file rather than the 159G file. 

But compaction sorts the files into buckets of similar size (I think this has to do with efficiency of the process, see the FB paper http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf). Generally the files in the bucket are within 50% of the average size for the files in the bucket. In your case I think it would create 

bucket 1 : 61G
bucket 2 : 159, 191, 196, 197

Bucket 1 does not have enough files to trigger minor compaction (assuming default setting of 4 ) bucket 2 does, but the only file it can compact is the 159G file because there is only 211G of file space. 
 
The problem here is not enough disk space, and the symptom is compaction not been able to make progress. Where progress is compacting bucket 2 into 1 file which would need 743G free space.  

By changing the min_compaction_threshold on the CF you can stop the endless compactions. That will buy you some time, but ultimately you need more disk space. 

Hope that gives some background on what (I think) is happening.
Aaron

On 22 Apr 2011, at 01:34, Shotaro Kamio wrote:

> Hi Aaron,
> 
> 
> Maybe, my previous description was not good. It's not a compaction
> threshold problem.
> In fact, Cassandra tries to compact 7 sstables in the minor
> compaction. But it decreases the number of sstables one by one due to
> insufficient disk space. At the end, it compacts a single file as in
> the new log below.
> 
> Compactionstats on a node says:
> 
>  compaction type: Minor
>  column family: foobar
>  bytes compacted: 133473101929
>  bytes total in progress: 170000743825
>  pending tasks: 12
> 
> The disk usage reaches 78%. It's really tough situation. But I guess
> the data contains a lot of duplicates. because we feed same data again
> and again and do repair.
> 
> 
> Another thing I'm wondering is a file selection algorithm.
> For example, one of disks has 235G free space. It contains sstables of
> 61G, 159G, 191G, 196G, 197G. The one cassandra trying to compact
> forever is 159G sstable. But there is smaller sstable. It should try
> compacting 61G + 159G ideally.
> A more intelligent algorithm is required to find optimal combination.
> And if cassandra knows statistics about number of deleted data and old
> data to be compacted for sstables, it should be useful to find more
> efficient file combination.
> 
> 
> Regards,
> Shotaro
> 
> 
> 
> * Minor compaction log
> -----
> WARN [CompactionExecutor:1] 2011-04-21 21:44:08,554
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-773-Data.db'),
> SSTableReader(path='foobar-f-1452-Data.db'),
> SSTableReader(path='foobar-f-1620-Data.db'),
> SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1690-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
> WARN [CompactionExecutor:1] 2011-04-21 21:44:28,565
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-773-Data.db'),
> SSTableReader(path='foobar-f-1452-Data.db'),
> SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1690-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
> WARN [CompactionExecutor:1] 2011-04-21 21:44:48,576
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-773-Data.db'),
> SSTableReader(path='foobar-f-1452-Data.db'),
> SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
> WARN [CompactionExecutor:1] 2011-04-21 21:45:08,586
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-1452-Data.db'),
> SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
> WARN [CompactionExecutor:1] 2011-04-21 21:45:28,596
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1643-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
> WARN [CompactionExecutor:1] 2011-04-21 21:45:48,607
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-1642-Data.db'),
> SSTableReader(path='foobar-f-1814-Data.db')
> ------
> 
> 
> 
> On Thu, Apr 21, 2011 at 7:20 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> Want to check if you are talking about minor compactions or major (nodetool)
>> compactions.
>> What settings compaction settings do you have for this CF ? You can increase
>> the min compaction threshold and reduce the frequency of
>> compactions http://wiki.apache.org/cassandra/StorageConfiguration
>> It seems like compaction is running continually, are their pending tasks in
>> the o.a.c.db.CompactionManager MBean ?
>> How bad is you disk space problem ?
>> For the code change, AFAIK it's not possible for cassandra to know if there
>> are tombstones in the SSTable which can be purged until the rows are read.
>> Perhaps the file could hold the earliest deleted at time somewhere (same for
>> TTL), but I do not think we do that now.
>> Hope that helps.
>> Aaron
>> 
>> On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:
>> 
>> Hi,
>> 
>> I found that our cluster repeats compacting a single file forever
>> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
>> like to have comments from you guys.
>> 
>> Situation:
>> - After trying to repair a column family, our cluster's disk usage is
>> quite high. Cassandra cannot compact all sstables at once. I think it
>> repeats compacting single file at the end. (you can check the attached
>> log below)
>> - Our data doesn't have deletes. So, the compaction of single file
>> doesn't make free disk space.
>> 
>> We are approaching to full-disk. But I believe that the repair
>> operation made a lot of duplicate data on the disk and it requires
>> compaction. However, most of nodes stuck on compacting a single file.
>> The only thing we can do is to restart the nodes.
>> 
>> My question is why the compaction doesn't stop.
>> 
>> I looked at the logic in CompactionManager.java:
>> -----------------
>>        String compactionFileLocation =
>> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
>>        // If the compaction file path is null that means we have no
>> space left for this compaction.
>>        // try again w/o the largest one.
>>        List<SSTableReader> smallerSSTables = new
>> ArrayList<SSTableReader>(sstables);
>>        while (compactionFileLocation == null && smallerSSTables.size() > 1)
>>        {
>>            logger.warn("insufficient space to compact all requested
>> files " + StringUtils.join(smallerSSTables, ", "));
>>            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
>>            compactionFileLocation =
>> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
>>        }
>>        if (compactionFileLocation == null)
>>        {
>>            logger.error("insufficient space to compact even the two
>> smallest files, aborting");
>>            return 0;
>>        }
>> -----------------
>> 
>> The while condition: smallerSSTables.size() > 1
>> Is this should be "smallerSSTables.size() > 2" ?
>> 
>> In my understanding, compaction of single file makes free disk space
>> only when the sstable has a lot of tombstone and only if the tombstone
>> is removed in the compaction. If cassandra knows the sstable has
>> tombstones to be removed, it's worth to compact it. Otherwise, it
>> might makes free space if you are lucky. In worst case, it leads to
>> infinite loop like our case.
>> 
>> What do you think the code change?
>> 
>> 
>> Best regards,
>> Shotaro
>> 
>> 
>> * Cassandra compaction log
>> -------------------------
>> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(
>> path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
>> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
>> CompactionManager.java (line 482) Compacted to
>> foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>> of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.
>> 
>> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-3020-Data.db'),
>> SSTableReader(path='foobar-f-3035-Data.db')
>> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
>> CompactionManager.java (line 482) Compacted to
>> foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>> of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.
>> 
>> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
>> CompactionManager.java (line 405) insufficient space to compact all
>> requested files SSTableReader(path='foobar-f-3020-Data.db'),
>> SSTableReader(path='foobar-f-3036-Data.db')
>> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
>> CompactionManager.java (line 482) Compacted to
>> foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
>> of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
>> -------------------------
>> You can see that compacted size is always the same. It repeats
>> compacting the same single sstable.
>> 
>> 
> 
> 
> 
> -- 
> Shotaro Kamio


Re: Compacting single file forever

Posted by Shotaro Kamio <ka...@gmail.com>.
Hi Aaron,


Maybe, my previous description was not good. It's not a compaction
threshold problem.
In fact, Cassandra tries to compact 7 sstables in the minor
compaction. But it decreases the number of sstables one by one due to
insufficient disk space. At the end, it compacts a single file as in
the new log below.

Compactionstats on a node says:

  compaction type: Minor
  column family: foobar
  bytes compacted: 133473101929
  bytes total in progress: 170000743825
  pending tasks: 12

The disk usage reaches 78%. It's really tough situation. But I guess
the data contains a lot of duplicates. because we feed same data again
and again and do repair.


Another thing I'm wondering is a file selection algorithm.
For example, one of disks has 235G free space. It contains sstables of
61G, 159G, 191G, 196G, 197G. The one cassandra trying to compact
forever is 159G sstable. But there is smaller sstable. It should try
compacting 61G + 159G ideally.
A more intelligent algorithm is required to find optimal combination.
And if cassandra knows statistics about number of deleted data and old
data to be compacted for sstables, it should be useful to find more
efficient file combination.


Regards,
Shotaro



* Minor compaction log
-----
 WARN [CompactionExecutor:1] 2011-04-21 21:44:08,554
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-773-Data.db'),
SSTableReader(path='foobar-f-1452-Data.db'),
SSTableReader(path='foobar-f-1620-Data.db'),
SSTableReader(path='foobar-f-1642-Data.db'),
SSTableReader(path='foobar-f-1643-Data.db'),
SSTableReader(path='foobar-f-1690-Data.db'),
SSTableReader(path='foobar-f-1814-Data.db')
 WARN [CompactionExecutor:1] 2011-04-21 21:44:28,565
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-773-Data.db'),
SSTableReader(path='foobar-f-1452-Data.db'),
SSTableReader(path='foobar-f-1642-Data.db'),
SSTableReader(path='foobar-f-1643-Data.db'),
SSTableReader(path='foobar-f-1690-Data.db'),
SSTableReader(path='foobar-f-1814-Data.db')
 WARN [CompactionExecutor:1] 2011-04-21 21:44:48,576
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-773-Data.db'),
SSTableReader(path='foobar-f-1452-Data.db'),
SSTableReader(path='foobar-f-1642-Data.db'),
SSTableReader(path='foobar-f-1643-Data.db'),
SSTableReader(path='foobar-f-1814-Data.db')
 WARN [CompactionExecutor:1] 2011-04-21 21:45:08,586
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-1452-Data.db'),
SSTableReader(path='foobar-f-1642-Data.db'),
SSTableReader(path='foobar-f-1643-Data.db'),
SSTableReader(path='foobar-f-1814-Data.db')
 WARN [CompactionExecutor:1] 2011-04-21 21:45:28,596
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-1642-Data.db'),
SSTableReader(path='foobar-f-1643-Data.db'),
SSTableReader(path='foobar-f-1814-Data.db')
 WARN [CompactionExecutor:1] 2011-04-21 21:45:48,607
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-1642-Data.db'),
SSTableReader(path='foobar-f-1814-Data.db')
------



On Thu, Apr 21, 2011 at 7:20 PM, aaron morton <aa...@thelastpickle.com> wrote:
> Want to check if you are talking about minor compactions or major (nodetool)
> compactions.
> What settings compaction settings do you have for this CF ? You can increase
> the min compaction threshold and reduce the frequency of
> compactions http://wiki.apache.org/cassandra/StorageConfiguration
> It seems like compaction is running continually, are their pending tasks in
> the o.a.c.db.CompactionManager MBean ?
> How bad is you disk space problem ?
> For the code change, AFAIK it's not possible for cassandra to know if there
> are tombstones in the SSTable which can be purged until the rows are read.
> Perhaps the file could hold the earliest deleted at time somewhere (same for
> TTL), but I do not think we do that now.
> Hope that helps.
> Aaron
>
> On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:
>
> Hi,
>
> I found that our cluster repeats compacting a single file forever
> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
> like to have comments from you guys.
>
> Situation:
> - After trying to repair a column family, our cluster's disk usage is
> quite high. Cassandra cannot compact all sstables at once. I think it
> repeats compacting single file at the end. (you can check the attached
> log below)
> - Our data doesn't have deletes. So, the compaction of single file
> doesn't make free disk space.
>
> We are approaching to full-disk. But I believe that the repair
> operation made a lot of duplicate data on the disk and it requires
> compaction. However, most of nodes stuck on compacting a single file.
> The only thing we can do is to restart the nodes.
>
> My question is why the compaction doesn't stop.
>
> I looked at the logic in CompactionManager.java:
> -----------------
>        String compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
>        // If the compaction file path is null that means we have no
> space left for this compaction.
>        // try again w/o the largest one.
>        List<SSTableReader> smallerSSTables = new
> ArrayList<SSTableReader>(sstables);
>        while (compactionFileLocation == null && smallerSSTables.size() > 1)
>        {
>            logger.warn("insufficient space to compact all requested
> files " + StringUtils.join(smallerSSTables, ", "));
>            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
>            compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
>        }
>        if (compactionFileLocation == null)
>        {
>            logger.error("insufficient space to compact even the two
> smallest files, aborting");
>            return 0;
>        }
> -----------------
>
> The while condition: smallerSSTables.size() > 1
> Is this should be "smallerSSTables.size() > 2" ?
>
> In my understanding, compaction of single file makes free disk space
> only when the sstable has a lot of tombstone and only if the tombstone
> is removed in the compaction. If cassandra knows the sstable has
> tombstones to be removed, it's worth to compact it. Otherwise, it
> might makes free space if you are lucky. In worst case, it leads to
> infinite loop like our case.
>
> What do you think the code change?
>
>
> Best regards,
> Shotaro
>
>
> * Cassandra compaction log
> -------------------------
> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(
> path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.
>
> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3035-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.
>
> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3036-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
> -------------------------
> You can see that compacted size is always the same. It repeats
> compacting the same single sstable.
>
>



-- 
Shotaro Kamio

Re: Compacting single file forever

Posted by aaron morton <aa...@thelastpickle.com>.
Want to check if you are talking about minor compactions or major (nodetool) compactions. 
What settings compaction settings do you have for this CF ? You can increase the min compaction threshold and reduce the frequency of compactions http://wiki.apache.org/cassandra/StorageConfiguration
It seems like compaction is running continually, are their pending tasks in the o.a.c.db.CompactionManager MBean ? 
How bad is you disk space problem ? 

For the code change, AFAIK it's not possible for cassandra to know if there are tombstones in the SSTable which can be purged until the rows are read. Perhaps the file could hold the earliest deleted at time somewhere (same for TTL), but I do not think we do that now.

Hope that helps. 
Aaron


On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:

> Hi,
> 
> I found that our cluster repeats compacting a single file forever
> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
> like to have comments from you guys.
> 
> Situation:
> - After trying to repair a column family, our cluster's disk usage is
> quite high. Cassandra cannot compact all sstables at once. I think it
> repeats compacting single file at the end. (you can check the attached
> log below)
> - Our data doesn't have deletes. So, the compaction of single file
> doesn't make free disk space.
> 
> We are approaching to full-disk. But I believe that the repair
> operation made a lot of duplicate data on the disk and it requires
> compaction. However, most of nodes stuck on compacting a single file.
> The only thing we can do is to restart the nodes.
> 
> My question is why the compaction doesn't stop.
> 
> I looked at the logic in CompactionManager.java:
> -----------------
>        String compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
>        // If the compaction file path is null that means we have no
> space left for this compaction.
>        // try again w/o the largest one.
>        List<SSTableReader> smallerSSTables = new
> ArrayList<SSTableReader>(sstables);
>        while (compactionFileLocation == null && smallerSSTables.size() > 1)
>        {
>            logger.warn("insufficient space to compact all requested
> files " + StringUtils.join(smallerSSTables, ", "));
>            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
>            compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
>        }
>        if (compactionFileLocation == null)
>        {
>            logger.error("insufficient space to compact even the two
> smallest files, aborting");
>            return 0;
>        }
> -----------------
> 
> The while condition: smallerSSTables.size() > 1
> Is this should be "smallerSSTables.size() > 2" ?
> 
> In my understanding, compaction of single file makes free disk space
> only when the sstable has a lot of tombstone and only if the tombstone
> is removed in the compaction. If cassandra knows the sstable has
> tombstones to be removed, it's worth to compact it. Otherwise, it
> might makes free space if you are lucky. In worst case, it leads to
> infinite loop like our case.
> 
> What do you think the code change?
> 
> 
> Best regards,
> Shotaro
> 
> 
> * Cassandra compaction log
> -------------------------
> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(
> path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.
> 
> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3035-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.
> 
> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3036-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
> -------------------------
> You can see that compacted size is always the same. It repeats
> compacting the same single sstable.


Re: Compacting single file forever

Posted by aaron morton <aa...@thelastpickle.com>.
Moving to the user list. 

Aaron

On 20 Apr 2011, at 21:25, Shotaro Kamio wrote:

> Hi,
> 
> I found that our cluster repeats compacting a single file forever
> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
> like to have comments from you guys.
> 
> Situation:
> - After trying to repair a column family, our cluster's disk usage is
> quite high. Cassandra cannot compact all sstables at once. I think it
> repeats compacting single file at the end. (you can check the attached
> log below)
> - Our data doesn't have deletes. So, the compaction of single file
> doesn't make free disk space.
> 
> We are approaching to full-disk. But I believe that the repair
> operation made a lot of duplicate data on the disk and it requires
> compaction. However, most of nodes stuck on compacting a single file.
> The only thing we can do is to restart the nodes.
> 
> My question is why the compaction doesn't stop.
> 
> I looked at the logic in CompactionManager.java:
> -----------------
>        String compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
>        // If the compaction file path is null that means we have no
> space left for this compaction.
>        // try again w/o the largest one.
>        List<SSTableReader> smallerSSTables = new
> ArrayList<SSTableReader>(sstables);
>        while (compactionFileLocation == null && smallerSSTables.size() > 1)
>        {
>            logger.warn("insufficient space to compact all requested
> files " + StringUtils.join(smallerSSTables, ", "));
>            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
>            compactionFileLocation =
> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
>        }
>        if (compactionFileLocation == null)
>        {
>            logger.error("insufficient space to compact even the two
> smallest files, aborting");
>            return 0;
>        }
> -----------------
> 
> The while condition: smallerSSTables.size() > 1
> Is this should be "smallerSSTables.size() > 2" ?
> 
> In my understanding, compaction of single file makes free disk space
> only when the sstable has a lot of tombstone and only if the tombstone
> is removed in the compaction. If cassandra knows the sstable has
> tombstones to be removed, it's worth to compact it. Otherwise, it
> might makes free space if you are lucky. In worst case, it leads to
> infinite loop like our case.
> 
> What do you think the code change?
> 
> 
> Best regards,
> Shotaro
> 
> 
> * Cassandra compaction log
> -------------------------
> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(
> path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.
> 
> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3035-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.
> 
> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
> CompactionManager.java (line 405) insufficient space to compact all
> requested files SSTableReader(path='foobar-f-3020-Data.db'),
> SSTableReader(path='foobar-f-3036-Data.db')
> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
> CompactionManager.java (line 482) Compacted to
> foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
> of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
> -------------------------
> You can see that compacted size is always the same. It repeats
> compacting the same single sstable.