You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ramzi Rabah <rr...@playdom.com> on 2009/12/04 00:18:49 UTC

Removes increasing disk space usage in Cassandra?

Hi all,

I ran a test where I inserted about 1.2 Gigabytes worth of data into
each node of a 4 node cluster.
I ran a script that first calls a get on each column inserted followed
by a remove. Since I was basically removing every entry
I inserted before, I expected that the disk space occupied by the
nodes will go down and eventually become 0. The disk space
actually goes up when I do the bulk removes to about 1.8 gigs per
node. Am I missing something here?

Thanks a lot for your help
Ray

Re: Removes increasing disk space usage in Cassandra?

Posted by Ramzi Rabah <rr...@playdom.com>.

Done
https://issues.apache.org/jira/browse/CASSANDRA-604

On Fri, Dec 4, 2009 at 4:01 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Please do.
>
> On Fri, Dec 4, 2009 at 5:53 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>> Thanks Jonathan.
>> Should I open a bug for this?
>>
>> Ray
>>
>> On Fri, Dec 4, 2009 at 3:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> On Fri, Dec 4, 2009 at 5:32 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>> Starting with fresh directories with no data and trying to do simple
>>>> inserts, I could not reproduce it *sigh*. Nothing is simple :(, so I
>>>> decided to dig deeper into the code.
>>>>
>>>> I was looking at the code for compaction, and this is a very noob
>>>> concern, so please bare with me if I'm way off, this code is all new
>>>> to me. When we are doing compactions during the normal course of
>>>> cassandra, we call:
>>>>
>>>>            for (List<SSTableReader> sstables :
>>>> getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
>>>>            {
>>>>                if (sstables.size() < minThreshold)
>>>>                {
>>>>                    continue;
>>>>                }
>>>>                other wise docompactions...
>>>>
>>>> where getCompactionBuckets puts in buckets very small files, or files
>>>> that are 0.5-1.5 of each other's sizes. It will only compact those if
>>>> they are >= minimum threshold which is 4 by default.
>>>
>>> Exactly right.
>>>
>>>> So far so good. Now how about this scenario, I have an old entry that
>>>> I inserted long time ago and that was compacted into a 75MB file.
>>>> There are fewer 75MB files than 4. I do many deletes, and I end with 4
>>>> extra sstable files filled with tombstones, each about 300 MB large.
>>>> These 4 files are compacted together and in the compaction code, if
>>>> the tombstone is there we don't copy it over to the new file. Now
>>>> since we did not compact the 75MB files, but we compacted the
>>>> tombstone files, doesn't that leave us with the tombstone gone, but
>>>> the data still intact in the 75MB file?
>>>
>>> Also right.  Glad you had a look! :)
>>>
>>> One relatively easy fix would be to only GC the tombstones if there
>>> are no SSTables left for that CF older than the ones being compacted.
>>> (So, a "major" compaction, which compacts all SSTables and is what
>>> nodeprobe invokes, would always GC eligible tombstones.)
>>>
>>> -Jonathan
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.

Please do.

On Fri, Dec 4, 2009 at 5:53 PM, Ramzi Rabah <rr...@playdom.com> wrote:
> Thanks Jonathan.
> Should I open a bug for this?
>
> Ray
>
> On Fri, Dec 4, 2009 at 3:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> On Fri, Dec 4, 2009 at 5:32 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>> Starting with fresh directories with no data and trying to do simple
>>> inserts, I could not reproduce it *sigh*. Nothing is simple :(, so I
>>> decided to dig deeper into the code.
>>>
>>> I was looking at the code for compaction, and this is a very noob
>>> concern, so please bare with me if I'm way off, this code is all new
>>> to me. When we are doing compactions during the normal course of
>>> cassandra, we call:
>>>
>>>            for (List<SSTableReader> sstables :
>>> getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
>>>            {
>>>                if (sstables.size() < minThreshold)
>>>                {
>>>                    continue;
>>>                }
>>>                other wise docompactions...
>>>
>>> where getCompactionBuckets puts in buckets very small files, or files
>>> that are 0.5-1.5 of each other's sizes. It will only compact those if
>>> they are >= minimum threshold which is 4 by default.
>>
>> Exactly right.
>>
>>> So far so good. Now how about this scenario, I have an old entry that
>>> I inserted long time ago and that was compacted into a 75MB file.
>>> There are fewer 75MB files than 4. I do many deletes, and I end with 4
>>> extra sstable files filled with tombstones, each about 300 MB large.
>>> These 4 files are compacted together and in the compaction code, if
>>> the tombstone is there we don't copy it over to the new file. Now
>>> since we did not compact the 75MB files, but we compacted the
>>> tombstone files, doesn't that leave us with the tombstone gone, but
>>> the data still intact in the 75MB file?
>>
>> Also right.  Glad you had a look! :)
>>
>> One relatively easy fix would be to only GC the tombstones if there
>> are no SSTables left for that CF older than the ones being compacted.
>> (So, a "major" compaction, which compacts all SSTables and is what
>> nodeprobe invokes, would always GC eligible tombstones.)
>>
>> -Jonathan
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Ramzi Rabah <rr...@playdom.com>.

Thanks Jonathan.
Should I open a bug for this?

Ray

On Fri, Dec 4, 2009 at 3:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Fri, Dec 4, 2009 at 5:32 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>> Starting with fresh directories with no data and trying to do simple
>> inserts, I could not reproduce it *sigh*. Nothing is simple :(, so I
>> decided to dig deeper into the code.
>>
>> I was looking at the code for compaction, and this is a very noob
>> concern, so please bare with me if I'm way off, this code is all new
>> to me. When we are doing compactions during the normal course of
>> cassandra, we call:
>>
>>            for (List<SSTableReader> sstables :
>> getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
>>            {
>>                if (sstables.size() < minThreshold)
>>                {
>>                    continue;
>>                }
>>                other wise docompactions...
>>
>> where getCompactionBuckets puts in buckets very small files, or files
>> that are 0.5-1.5 of each other's sizes. It will only compact those if
>> they are >= minimum threshold which is 4 by default.
>
> Exactly right.
>
>> So far so good. Now how about this scenario, I have an old entry that
>> I inserted long time ago and that was compacted into a 75MB file.
>> There are fewer 75MB files than 4. I do many deletes, and I end with 4
>> extra sstable files filled with tombstones, each about 300 MB large.
>> These 4 files are compacted together and in the compaction code, if
>> the tombstone is there we don't copy it over to the new file. Now
>> since we did not compact the 75MB files, but we compacted the
>> tombstone files, doesn't that leave us with the tombstone gone, but
>> the data still intact in the 75MB file?
>
> Also right.  Glad you had a look! :)
>
> One relatively easy fix would be to only GC the tombstones if there
> are no SSTables left for that CF older than the ones being compacted.
> (So, a "major" compaction, which compacts all SSTables and is what
> nodeprobe invokes, would always GC eligible tombstones.)
>
> -Jonathan
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.

On Fri, Dec 4, 2009 at 5:32 PM, Ramzi Rabah <rr...@playdom.com> wrote:
> Starting with fresh directories with no data and trying to do simple
> inserts, I could not reproduce it *sigh*. Nothing is simple :(, so I
> decided to dig deeper into the code.
>
> I was looking at the code for compaction, and this is a very noob
> concern, so please bare with me if I'm way off, this code is all new
> to me. When we are doing compactions during the normal course of
> cassandra, we call:
>
>            for (List<SSTableReader> sstables :
> getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
>            {
>                if (sstables.size() < minThreshold)
>                {
>                    continue;
>                }
>                other wise docompactions...
>
> where getCompactionBuckets puts in buckets very small files, or files
> that are 0.5-1.5 of each other's sizes. It will only compact those if
> they are >= minimum threshold which is 4 by default.

Exactly right.

> So far so good. Now how about this scenario, I have an old entry that
> I inserted long time ago and that was compacted into a 75MB file.
> There are fewer 75MB files than 4. I do many deletes, and I end with 4
> extra sstable files filled with tombstones, each about 300 MB large.
> These 4 files are compacted together and in the compaction code, if
> the tombstone is there we don't copy it over to the new file. Now
> since we did not compact the 75MB files, but we compacted the
> tombstone files, doesn't that leave us with the tombstone gone, but
> the data still intact in the 75MB file?

Also right.  Glad you had a look! :)

One relatively easy fix would be to only GC the tombstones if there
are no SSTables left for that CF older than the ones being compacted.
(So, a "major" compaction, which compacts all SSTables and is what
nodeprobe invokes, would always GC eligible tombstones.)

-Jonathan

Re: Removes increasing disk space usage in Cassandra?

Posted by Ramzi Rabah <rr...@playdom.com>.

Starting with fresh directories with no data and trying to do simple
inserts, I could not reproduce it *sigh*. Nothing is simple :(, so I
decided to dig deeper into the code.

I was looking at the code for compaction, and this is a very noob
concern, so please bare with me if I'm way off, this code is all new
to me. When we are doing compactions during the normal course of
cassandra, we call:

            for (List<SSTableReader> sstables :
getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
            {
                if (sstables.size() < minThreshold)
                {
                    continue;
                }
                other wise docompactions...

where getCompactionBuckets puts in buckets very small files, or files
that are 0.5-1.5 of each other's sizes. It will only compact those if
they are >= minimum threshold which is 4 by default.
So far so good. Now how about this scenario, I have an old entry that
I inserted long time ago and that was compacted into a 75MB file.
There are fewer 75MB files than 4. I do many deletes, and I end with 4
extra sstable files filled with tombstones, each about 300 MB large.
These 4 files are compacted together and in the compaction code, if
the tombstone is there we don't copy it over to the new file. Now
since we did not compact the 75MB files, but we compacted the
tombstone files, doesn't that leave us with the tombstone gone, but
the data still intact in the 75MB file? Or did I miss in the code the
part where the original data is removed. Now if we compacted all the
files together I don't think that would be a problem, but since we
only compact 4, wouldn't that potentially leave data uncleaned?

Again sorry if I am way off.

Thanks
Ray




On Fri, Dec 4, 2009 at 12:52 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Okay, in that case it doesn't hurt to update just in case but I think
> you're going to need that test case. :)
>
> On Fri, Dec 4, 2009 at 2:45 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>> I have a two week old version of trunk. Probably need to update it to
>> latest build.
>>
>> On Fri, Dec 4, 2009 at 12:34 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> Are you testing trunk?  If not, you should check that first to see if
>>> it's already fixed.
>>>
>>> On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>> Just to be clear what I meant is that I ran the deletions and
>>>> compaction with GCGraceSeconds set to 1 hour, so there was enough time
>>>> for the tombstones to expire.
>>>> Anyway I will try to make a simpler test case to hopefully reproduce
>>>> this, and I will share the code if I can reproduce.
>>>>
>>>> Ray
>>>>
>>>> On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>> Hi Jonathan I have changed that to 3600(one hour) based on your
>>>>> recommendation before.
>>>>>
>>>>> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>> this is what I was referring to by "the period specified in your config file":
>>>>>>
>>>>>>  <!--
>>>>>>   ~ Time to wait before garbage-collection deletion markers.  Set this to
>>>>>>   ~ a large enough value that you are confident that the deletion marker
>>>>>>   ~ will be propagated to all replicas by the time this many seconds has
>>>>>>   ~ elapsed, even in the face of hardware failures.  The default value is
>>>>>>   ~ ten days.
>>>>>>  -->
>>>>>>  <GCGraceSeconds>864000</GCGraceSeconds>
>>>>>>
>>>>>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>>> I think there might be a bug in the deletion logic. I removed all the
>>>>>>> data on the cluster by running remove on every single key I entered,
>>>>>>> and I run major compaction
>>>>>>> nodeprobe -host hostname compact on a certain node, and after the
>>>>>>> compaction is over, I am left with one data file/ one index file and
>>>>>>> the bloom filter file,
>>>>>>> and they are the same size of data as before I started doing the deletes.
>>>>>>>
>>>>>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>>>> cassandra never modifies data in-place.  so it writes tombstones to
>>>>>>>> supress the older writes, and when compaction occurs the data and
>>>>>>>> tombstones get GC'd (after the period specified in your config file).
>>>>>>>>
>>>>>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>>>>> Looking at jconsole I see a high number of writes when I do removes,
>>>>>>>>> so I am guessing these are tombstones being written? If that's the
>>>>>>>>> case, is the data being removed and replaced by tombstones? and will
>>>>>>>>> they all be deleted eventually when compaction runs?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>>>>>>>>>> each node of a 4 node cluster.
>>>>>>>>>> I ran a script that first calls a get on each column inserted followed
>>>>>>>>>> by a remove. Since I was basically removing every entry
>>>>>>>>>> I inserted before, I expected that the disk space occupied by the
>>>>>>>>>> nodes will go down and eventually become 0. The disk space
>>>>>>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>>>>>>>> node. Am I missing something here?
>>>>>>>>>>
>>>>>>>>>> Thanks a lot for your help
>>>>>>>>>> Ray
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.

Okay, in that case it doesn't hurt to update just in case but I think
you're going to need that test case. :)

On Fri, Dec 4, 2009 at 2:45 PM, Ramzi Rabah <rr...@playdom.com> wrote:
> I have a two week old version of trunk. Probably need to update it to
> latest build.
>
> On Fri, Dec 4, 2009 at 12:34 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Are you testing trunk?  If not, you should check that first to see if
>> it's already fixed.
>>
>> On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>> Just to be clear what I meant is that I ran the deletions and
>>> compaction with GCGraceSeconds set to 1 hour, so there was enough time
>>> for the tombstones to expire.
>>> Anyway I will try to make a simpler test case to hopefully reproduce
>>> this, and I will share the code if I can reproduce.
>>>
>>> Ray
>>>
>>> On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>> Hi Jonathan I have changed that to 3600(one hour) based on your
>>>> recommendation before.
>>>>
>>>> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>> this is what I was referring to by "the period specified in your config file":
>>>>>
>>>>>  <!--
>>>>>   ~ Time to wait before garbage-collection deletion markers.  Set this to
>>>>>   ~ a large enough value that you are confident that the deletion marker
>>>>>   ~ will be propagated to all replicas by the time this many seconds has
>>>>>   ~ elapsed, even in the face of hardware failures.  The default value is
>>>>>   ~ ten days.
>>>>>  -->
>>>>>  <GCGraceSeconds>864000</GCGraceSeconds>
>>>>>
>>>>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>> I think there might be a bug in the deletion logic. I removed all the
>>>>>> data on the cluster by running remove on every single key I entered,
>>>>>> and I run major compaction
>>>>>> nodeprobe -host hostname compact on a certain node, and after the
>>>>>> compaction is over, I am left with one data file/ one index file and
>>>>>> the bloom filter file,
>>>>>> and they are the same size of data as before I started doing the deletes.
>>>>>>
>>>>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>>> cassandra never modifies data in-place.  so it writes tombstones to
>>>>>>> supress the older writes, and when compaction occurs the data and
>>>>>>> tombstones get GC'd (after the period specified in your config file).
>>>>>>>
>>>>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>>>> Looking at jconsole I see a high number of writes when I do removes,
>>>>>>>> so I am guessing these are tombstones being written? If that's the
>>>>>>>> case, is the data being removed and replaced by tombstones? and will
>>>>>>>> they all be deleted eventually when compaction runs?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>>>>>>>>> each node of a 4 node cluster.
>>>>>>>>> I ran a script that first calls a get on each column inserted followed
>>>>>>>>> by a remove. Since I was basically removing every entry
>>>>>>>>> I inserted before, I expected that the disk space occupied by the
>>>>>>>>> nodes will go down and eventually become 0. The disk space
>>>>>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>>>>>>> node. Am I missing something here?
>>>>>>>>>
>>>>>>>>> Thanks a lot for your help
>>>>>>>>> Ray
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Ramzi Rabah <rr...@playdom.com>.

I have a two week old version of trunk. Probably need to update it to
latest build.

On Fri, Dec 4, 2009 at 12:34 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Are you testing trunk?  If not, you should check that first to see if
> it's already fixed.
>
> On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>> Just to be clear what I meant is that I ran the deletions and
>> compaction with GCGraceSeconds set to 1 hour, so there was enough time
>> for the tombstones to expire.
>> Anyway I will try to make a simpler test case to hopefully reproduce
>> this, and I will share the code if I can reproduce.
>>
>> Ray
>>
>> On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <rr...@playdom.com> wrote:
>>> Hi Jonathan I have changed that to 3600(one hour) based on your
>>> recommendation before.
>>>
>>> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> this is what I was referring to by "the period specified in your config file":
>>>>
>>>>  <!--
>>>>   ~ Time to wait before garbage-collection deletion markers.  Set this to
>>>>   ~ a large enough value that you are confident that the deletion marker
>>>>   ~ will be propagated to all replicas by the time this many seconds has
>>>>   ~ elapsed, even in the face of hardware failures.  The default value is
>>>>   ~ ten days.
>>>>  -->
>>>>  <GCGraceSeconds>864000</GCGraceSeconds>
>>>>
>>>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>> I think there might be a bug in the deletion logic. I removed all the
>>>>> data on the cluster by running remove on every single key I entered,
>>>>> and I run major compaction
>>>>> nodeprobe -host hostname compact on a certain node, and after the
>>>>> compaction is over, I am left with one data file/ one index file and
>>>>> the bloom filter file,
>>>>> and they are the same size of data as before I started doing the deletes.
>>>>>
>>>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>> cassandra never modifies data in-place.  so it writes tombstones to
>>>>>> supress the older writes, and when compaction occurs the data and
>>>>>> tombstones get GC'd (after the period specified in your config file).
>>>>>>
>>>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>>> Looking at jconsole I see a high number of writes when I do removes,
>>>>>>> so I am guessing these are tombstones being written? If that's the
>>>>>>> case, is the data being removed and replaced by tombstones? and will
>>>>>>> they all be deleted eventually when compaction runs?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>>>>>>>> each node of a 4 node cluster.
>>>>>>>> I ran a script that first calls a get on each column inserted followed
>>>>>>>> by a remove. Since I was basically removing every entry
>>>>>>>> I inserted before, I expected that the disk space occupied by the
>>>>>>>> nodes will go down and eventually become 0. The disk space
>>>>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>>>>>> node. Am I missing something here?
>>>>>>>>
>>>>>>>> Thanks a lot for your help
>>>>>>>> Ray
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.

Are you testing trunk?  If not, you should check that first to see if
it's already fixed.

On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah <rr...@playdom.com> wrote:
> Just to be clear what I meant is that I ran the deletions and
> compaction with GCGraceSeconds set to 1 hour, so there was enough time
> for the tombstones to expire.
> Anyway I will try to make a simpler test case to hopefully reproduce
> this, and I will share the code if I can reproduce.
>
> Ray
>
> On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <rr...@playdom.com> wrote:
>> Hi Jonathan I have changed that to 3600(one hour) based on your
>> recommendation before.
>>
>> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> this is what I was referring to by "the period specified in your config file":
>>>
>>>  <!--
>>>   ~ Time to wait before garbage-collection deletion markers.  Set this to
>>>   ~ a large enough value that you are confident that the deletion marker
>>>   ~ will be propagated to all replicas by the time this many seconds has
>>>   ~ elapsed, even in the face of hardware failures.  The default value is
>>>   ~ ten days.
>>>  -->
>>>  <GCGraceSeconds>864000</GCGraceSeconds>
>>>
>>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>> I think there might be a bug in the deletion logic. I removed all the
>>>> data on the cluster by running remove on every single key I entered,
>>>> and I run major compaction
>>>> nodeprobe -host hostname compact on a certain node, and after the
>>>> compaction is over, I am left with one data file/ one index file and
>>>> the bloom filter file,
>>>> and they are the same size of data as before I started doing the deletes.
>>>>
>>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>> cassandra never modifies data in-place.  so it writes tombstones to
>>>>> supress the older writes, and when compaction occurs the data and
>>>>> tombstones get GC'd (after the period specified in your config file).
>>>>>
>>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>> Looking at jconsole I see a high number of writes when I do removes,
>>>>>> so I am guessing these are tombstones being written? If that's the
>>>>>> case, is the data being removed and replaced by tombstones? and will
>>>>>> they all be deleted eventually when compaction runs?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>>>>>>> each node of a 4 node cluster.
>>>>>>> I ran a script that first calls a get on each column inserted followed
>>>>>>> by a remove. Since I was basically removing every entry
>>>>>>> I inserted before, I expected that the disk space occupied by the
>>>>>>> nodes will go down and eventually become 0. The disk space
>>>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>>>>> node. Am I missing something here?
>>>>>>>
>>>>>>> Thanks a lot for your help
>>>>>>> Ray
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Ramzi Rabah <rr...@playdom.com>.

Just to be clear what I meant is that I ran the deletions and
compaction with GCGraceSeconds set to 1 hour, so there was enough time
for the tombstones to expire.
Anyway I will try to make a simpler test case to hopefully reproduce
this, and I will share the code if I can reproduce.

Ray

On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <rr...@playdom.com> wrote:
> Hi Jonathan I have changed that to 3600(one hour) based on your
> recommendation before.
>
> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> this is what I was referring to by "the period specified in your config file":
>>
>>  <!--
>>   ~ Time to wait before garbage-collection deletion markers.  Set this to
>>   ~ a large enough value that you are confident that the deletion marker
>>   ~ will be propagated to all replicas by the time this many seconds has
>>   ~ elapsed, even in the face of hardware failures.  The default value is
>>   ~ ten days.
>>  -->
>>  <GCGraceSeconds>864000</GCGraceSeconds>
>>
>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>> I think there might be a bug in the deletion logic. I removed all the
>>> data on the cluster by running remove on every single key I entered,
>>> and I run major compaction
>>> nodeprobe -host hostname compact on a certain node, and after the
>>> compaction is over, I am left with one data file/ one index file and
>>> the bloom filter file,
>>> and they are the same size of data as before I started doing the deletes.
>>>
>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> cassandra never modifies data in-place.  so it writes tombstones to
>>>> supress the older writes, and when compaction occurs the data and
>>>> tombstones get GC'd (after the period specified in your config file).
>>>>
>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>> Looking at jconsole I see a high number of writes when I do removes,
>>>>> so I am guessing these are tombstones being written? If that's the
>>>>> case, is the data being removed and replaced by tombstones? and will
>>>>> they all be deleted eventually when compaction runs?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>>>>>> each node of a 4 node cluster.
>>>>>> I ran a script that first calls a get on each column inserted followed
>>>>>> by a remove. Since I was basically removing every entry
>>>>>> I inserted before, I expected that the disk space occupied by the
>>>>>> nodes will go down and eventually become 0. The disk space
>>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>>>> node. Am I missing something here?
>>>>>>
>>>>>> Thanks a lot for your help
>>>>>> Ray
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Ramzi Rabah <rr...@playdom.com>.

Hi Jonathan I have changed that to 3600(one hour) based on your
recommendation before.

On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> this is what I was referring to by "the period specified in your config file":
>
>  <!--
>   ~ Time to wait before garbage-collection deletion markers.  Set this to
>   ~ a large enough value that you are confident that the deletion marker
>   ~ will be propagated to all replicas by the time this many seconds has
>   ~ elapsed, even in the face of hardware failures.  The default value is
>   ~ ten days.
>  -->
>  <GCGraceSeconds>864000</GCGraceSeconds>
>
> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>> I think there might be a bug in the deletion logic. I removed all the
>> data on the cluster by running remove on every single key I entered,
>> and I run major compaction
>> nodeprobe -host hostname compact on a certain node, and after the
>> compaction is over, I am left with one data file/ one index file and
>> the bloom filter file,
>> and they are the same size of data as before I started doing the deletes.
>>
>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> cassandra never modifies data in-place.  so it writes tombstones to
>>> supress the older writes, and when compaction occurs the data and
>>> tombstones get GC'd (after the period specified in your config file).
>>>
>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>> Looking at jconsole I see a high number of writes when I do removes,
>>>> so I am guessing these are tombstones being written? If that's the
>>>> case, is the data being removed and replaced by tombstones? and will
>>>> they all be deleted eventually when compaction runs?
>>>>
>>>>
>>>>
>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>>>>> each node of a 4 node cluster.
>>>>> I ran a script that first calls a get on each column inserted followed
>>>>> by a remove. Since I was basically removing every entry
>>>>> I inserted before, I expected that the disk space occupied by the
>>>>> nodes will go down and eventually become 0. The disk space
>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>>> node. Am I missing something here?
>>>>>
>>>>> Thanks a lot for your help
>>>>> Ray
>>>>>
>>>>
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.

this is what I was referring to by "the period specified in your config file":

  <!--
   ~ Time to wait before garbage-collection deletion markers.  Set this to
   ~ a large enough value that you are confident that the deletion marker
   ~ will be propagated to all replicas by the time this many seconds has
   ~ elapsed, even in the face of hardware failures.  The default value is
   ~ ten days.
  -->
  <GCGraceSeconds>864000</GCGraceSeconds>

On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rr...@playdom.com> wrote:
> I think there might be a bug in the deletion logic. I removed all the
> data on the cluster by running remove on every single key I entered,
> and I run major compaction
> nodeprobe -host hostname compact on a certain node, and after the
> compaction is over, I am left with one data file/ one index file and
> the bloom filter file,
> and they are the same size of data as before I started doing the deletes.
>
> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> cassandra never modifies data in-place.  so it writes tombstones to
>> supress the older writes, and when compaction occurs the data and
>> tombstones get GC'd (after the period specified in your config file).
>>
>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>> Looking at jconsole I see a high number of writes when I do removes,
>>> so I am guessing these are tombstones being written? If that's the
>>> case, is the data being removed and replaced by tombstones? and will
>>> they all be deleted eventually when compaction runs?
>>>
>>>
>>>
>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>>> Hi all,
>>>>
>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>>>> each node of a 4 node cluster.
>>>> I ran a script that first calls a get on each column inserted followed
>>>> by a remove. Since I was basically removing every entry
>>>> I inserted before, I expected that the disk space occupied by the
>>>> nodes will go down and eventually become 0. The disk space
>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>> node. Am I missing something here?
>>>>
>>>> Thanks a lot for your help
>>>> Ray
>>>>
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Ramzi Rabah <rr...@playdom.com>.

I think there might be a bug in the deletion logic. I removed all the
data on the cluster by running remove on every single key I entered,
and I run major compaction
nodeprobe -host hostname compact on a certain node, and after the
compaction is over, I am left with one data file/ one index file and
the bloom filter file,
and they are the same size of data as before I started doing the deletes.

On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> cassandra never modifies data in-place.  so it writes tombstones to
> supress the older writes, and when compaction occurs the data and
> tombstones get GC'd (after the period specified in your config file).
>
> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>> Looking at jconsole I see a high number of writes when I do removes,
>> so I am guessing these are tombstones being written? If that's the
>> case, is the data being removed and replaced by tombstones? and will
>> they all be deleted eventually when compaction runs?
>>
>>
>>
>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>>> Hi all,
>>>
>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>>> each node of a 4 node cluster.
>>> I ran a script that first calls a get on each column inserted followed
>>> by a remove. Since I was basically removing every entry
>>> I inserted before, I expected that the disk space occupied by the
>>> nodes will go down and eventually become 0. The disk space
>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>> node. Am I missing something here?
>>>
>>> Thanks a lot for your help
>>> Ray
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.

cassandra never modifies data in-place.  so it writes tombstones to
supress the older writes, and when compaction occurs the data and
tombstones get GC'd (after the period specified in your config file).

On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rr...@playdom.com> wrote:
> Looking at jconsole I see a high number of writes when I do removes,
> so I am guessing these are tombstones being written? If that's the
> case, is the data being removed and replaced by tombstones? and will
> they all be deleted eventually when compaction runs?
>
>
>
> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
>> Hi all,
>>
>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>> each node of a 4 node cluster.
>> I ran a script that first calls a get on each column inserted followed
>> by a remove. Since I was basically removing every entry
>> I inserted before, I expected that the disk space occupied by the
>> nodes will go down and eventually become 0. The disk space
>> actually goes up when I do the bulk removes to about 1.8 gigs per
>> node. Am I missing something here?
>>
>> Thanks a lot for your help
>> Ray
>>
>

Re: Removes increasing disk space usage in Cassandra?

Posted by Ramzi Rabah <rr...@playdom.com>.

Looking at jconsole I see a high number of writes when I do removes,
so I am guessing these are tombstones being written? If that's the
case, is the data being removed and replaced by tombstones? and will
they all be deleted eventually when compaction runs?

On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rr...@playdom.com> wrote:
> Hi all,
>
> I ran a test where I inserted about 1.2 Gigabytes worth of data into
> each node of a 4 node cluster.
> I ran a script that first calls a get on each column inserted followed
> by a remove. Since I was basically removing every entry
> I inserted before, I expected that the disk space occupied by the
> nodes will go down and eventually become 0. The disk space
> actually goes up when I do the bulk removes to about 1.8 gigs per
> node. Am I missing something here?
>
> Thanks a lot for your help
> Ray
>