You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "B. Todd Burruss" <bb...@real.com> on 2010/04/20 19:33:46 UTC

cleaning house

i'm trying to draw some correlation between the size of my data and the 
space used on disk.  i have set <GCGraceSeconds>1</GCGraceSeconds> so 
there isn't any reason to keep data around.

my approach is this:

after only doing "puts" to cassandra for a while i stop my client and 
want to perform the proper "cleanup" and/or "compact" operations that 
will reduce the disk space used to a minimum.  however i can't seem to 
figure it out.  i've done "major compaction", "cleanup", etc. but 
doesn't seem to get the job done

so two questions

- what procedure is suggested to get rid of all unnecessary data?
- and what does the following "Compacted" file mean?  seams like it is 
marking "88" as compacted, but there are no more compactions happening 
according to compaction mgr

-rw-rw-r-- 1 bburruss bburruss          0 Apr 20 08:32 bucket-88-Compacted
-rw-rw-r-- 1 bburruss bburruss 1445218042 Apr 19 21:39 bucket-88-Data.db
-rw-rw-r-- 1 bburruss bburruss   12255925 Apr 19 21:39 bucket-88-Filter.db
-rw-rw-r-- 1 bburruss bburruss  451806386 Apr 19 21:39 bucket-88-Index.db


Re: cleaning house

Posted by "B. Todd Burruss" <bb...@real.com>.
thx, that did the trick.

Jonathan Ellis wrote:
> Added to http://wiki.apache.org/cassandra/MemtableSSTable:
>
> SSTables that are obsoleted by a compaction are deleted asynchronously
> when the JVM performs a GC.  You can force a GC from jconsole if
> necessary but this is not necessary; Cassandra will force one itself
> if it detects that it is low on space.  A compaction marker is also
> added to obsolete sstables so they can be deleted on startup if the
> server does not perform a GC before being restarted.
>
> CFStoreMBean exposes sstable space used as getLiveDiskSpaceUsed (only
> includes size of non-obsolete files) and getLiveDiskSpaceUsed
> (includes everything).
>
>
> On Tue, Apr 20, 2010 at 12:33 PM, B. Todd Burruss <bb...@real.com> wrote:
>   
>> i'm trying to draw some correlation between the size of my data and the
>> space used on disk.  i have set <GCGraceSeconds>1</GCGraceSeconds> so there
>> isn't any reason to keep data around.
>>
>> my approach is this:
>>
>> after only doing "puts" to cassandra for a while i stop my client and want
>> to perform the proper "cleanup" and/or "compact" operations that will reduce
>> the disk space used to a minimum.  however i can't seem to figure it out.
>>  i've done "major compaction", "cleanup", etc. but doesn't seem to get the
>> job done
>>
>> so two questions
>>
>> - what procedure is suggested to get rid of all unnecessary data?
>> - and what does the following "Compacted" file mean?  seams like it is
>> marking "88" as compacted, but there are no more compactions happening
>> according to compaction mgr
>>
>> -rw-rw-r-- 1 bburruss bburruss          0 Apr 20 08:32 bucket-88-Compacted
>> -rw-rw-r-- 1 bburruss bburruss 1445218042 Apr 19 21:39 bucket-88-Data.db
>> -rw-rw-r-- 1 bburruss bburruss   12255925 Apr 19 21:39 bucket-88-Filter.db
>> -rw-rw-r-- 1 bburruss bburruss  451806386 Apr 19 21:39 bucket-88-Index.db
>>
>>
>>     

Re: cleaning house

Posted by Jonathan Ellis <jb...@gmail.com>.
Added to http://wiki.apache.org/cassandra/MemtableSSTable:

SSTables that are obsoleted by a compaction are deleted asynchronously
when the JVM performs a GC.  You can force a GC from jconsole if
necessary but this is not necessary; Cassandra will force one itself
if it detects that it is low on space.  A compaction marker is also
added to obsolete sstables so they can be deleted on startup if the
server does not perform a GC before being restarted.

CFStoreMBean exposes sstable space used as getLiveDiskSpaceUsed (only
includes size of non-obsolete files) and getLiveDiskSpaceUsed
(includes everything).


On Tue, Apr 20, 2010 at 12:33 PM, B. Todd Burruss <bb...@real.com> wrote:
> i'm trying to draw some correlation between the size of my data and the
> space used on disk.  i have set <GCGraceSeconds>1</GCGraceSeconds> so there
> isn't any reason to keep data around.
>
> my approach is this:
>
> after only doing "puts" to cassandra for a while i stop my client and want
> to perform the proper "cleanup" and/or "compact" operations that will reduce
> the disk space used to a minimum.  however i can't seem to figure it out.
>  i've done "major compaction", "cleanup", etc. but doesn't seem to get the
> job done
>
> so two questions
>
> - what procedure is suggested to get rid of all unnecessary data?
> - and what does the following "Compacted" file mean?  seams like it is
> marking "88" as compacted, but there are no more compactions happening
> according to compaction mgr
>
> -rw-rw-r-- 1 bburruss bburruss          0 Apr 20 08:32 bucket-88-Compacted
> -rw-rw-r-- 1 bburruss bburruss 1445218042 Apr 19 21:39 bucket-88-Data.db
> -rw-rw-r-- 1 bburruss bburruss   12255925 Apr 19 21:39 bucket-88-Filter.db
> -rw-rw-r-- 1 bburruss bburruss  451806386 Apr 19 21:39 bucket-88-Index.db
>
>

Re: cleaning house

Posted by "B. Todd Burruss" <bb...@real.com>.
i have done no deletes, just inserts.  so you are correct, there isn't 
any "data" to cleanup.  however when i run some of the cleanup and/or 
compaction tasks the space used on disk actually grows, and i would like 
to force any unneeded files to be removed.  as i write this, jonathan 
has responded with i believe what i need.

thx!

Benjamin Black wrote:
> Are you deleting data through the API or just doing a bunch of inserts
> and then running a compaction?  The latter will not result in anything
> to clean up since data must be explicitly deleted.
>
>
> b
>
> On Tue, Apr 20, 2010 at 10:33 AM, B. Todd Burruss <bb...@real.com> wrote:
>   
>> i'm trying to draw some correlation between the size of my data and the
>> space used on disk.  i have set <GCGraceSeconds>1</GCGraceSeconds> so there
>> isn't any reason to keep data around.
>>
>> my approach is this:
>>
>> after only doing "puts" to cassandra for a while i stop my client and want
>> to perform the proper "cleanup" and/or "compact" operations that will reduce
>> the disk space used to a minimum.  however i can't seem to figure it out.
>>  i've done "major compaction", "cleanup", etc. but doesn't seem to get the
>> job done
>>
>> so two questions
>>
>> - what procedure is suggested to get rid of all unnecessary data?
>> - and what does the following "Compacted" file mean?  seams like it is
>> marking "88" as compacted, but there are no more compactions happening
>> according to compaction mgr
>>
>> -rw-rw-r-- 1 bburruss bburruss          0 Apr 20 08:32 bucket-88-Compacted
>> -rw-rw-r-- 1 bburruss bburruss 1445218042 Apr 19 21:39 bucket-88-Data.db
>> -rw-rw-r-- 1 bburruss bburruss   12255925 Apr 19 21:39 bucket-88-Filter.db
>> -rw-rw-r-- 1 bburruss bburruss  451806386 Apr 19 21:39 bucket-88-Index.db
>>
>>
>>     

Re: cleaning house

Posted by Benjamin Black <b...@b3k.us>.
Are you deleting data through the API or just doing a bunch of inserts
and then running a compaction?  The latter will not result in anything
to clean up since data must be explicitly deleted.


b

On Tue, Apr 20, 2010 at 10:33 AM, B. Todd Burruss <bb...@real.com> wrote:
> i'm trying to draw some correlation between the size of my data and the
> space used on disk.  i have set <GCGraceSeconds>1</GCGraceSeconds> so there
> isn't any reason to keep data around.
>
> my approach is this:
>
> after only doing "puts" to cassandra for a while i stop my client and want
> to perform the proper "cleanup" and/or "compact" operations that will reduce
> the disk space used to a minimum.  however i can't seem to figure it out.
>  i've done "major compaction", "cleanup", etc. but doesn't seem to get the
> job done
>
> so two questions
>
> - what procedure is suggested to get rid of all unnecessary data?
> - and what does the following "Compacted" file mean?  seams like it is
> marking "88" as compacted, but there are no more compactions happening
> according to compaction mgr
>
> -rw-rw-r-- 1 bburruss bburruss          0 Apr 20 08:32 bucket-88-Compacted
> -rw-rw-r-- 1 bburruss bburruss 1445218042 Apr 19 21:39 bucket-88-Data.db
> -rw-rw-r-- 1 bburruss bburruss   12255925 Apr 19 21:39 bucket-88-Filter.db
> -rw-rw-r-- 1 bburruss bburruss  451806386 Apr 19 21:39 bucket-88-Index.db
>
>