You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Boris Yen <yu...@gmail.com> on 2013/05/16 10:07:13 UTC

Major compaction does not seems to free the disk space a lot if wide rows are used.

Hi All,

Sorry for the wide distribution.

Our cassandra is running on 1.0.10. Recently, we are facing a weird
situation. We have a column family containing wide rows (each row might
have a few million of columns). We delete the columns on a daily basis and
we also run major compaction on it everyday to free up disk space (the
gc_grace is set to 600 seconds).

However, every time we run the major compaction, only 1 or 2GB disk space
is freed. We tried to delete most of the data before running compaction,
however, the result is pretty much the same.

So, we tried to check the source code. It seems that the column tombstones
could only be purged when the row key is not in other sstables. I know the
major compaction should include all sstables, however, in our use case,
columns get inserted rapidly. This will make the cassandra flush the
memtables to disk and create new sstables. The newly created sstables will
have the same keys as the sstables that are being compacted (the compaction
will take 2 or 3 hours to finish). My question is that will these newly
created sstables be the cause of why most of the column-tombstone not being
purged?

p.s. We also did some other tests. We inserted data to the same CF with the
same wide-row pattern and deleted most of the data. This time we stopped
all the writes to cassandra and did the compaction. The disk usage
decreased dramatically.

Any suggestions or is this a know issue.

Thanks and Regards,
Boris

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

Posted by Boris Yen <yu...@gmail.com>.
Thank you for the reply. It is really helpful.

We will take a look at the patch to see if we could apply it on 1.0 branch
or try to workaround it by changing our application implementation.

Regards,
Boris

On Thu, May 16, 2013 at 10:43 PM, Yuki Morishita <mo...@gmail.com> wrote:

> You are right about the behavior of cassandra compaction.
> It checks if the key exists on the other SSTable files that are not in
> the compaction set.
>
> I think https://issues.apache.org/jira/browse/CASSANDRA-4671 would
> help if you upgrade to latest 1.2,
> but in your version, I think the workaround is to stop write not to
> flush, then compact.
>
> On Thu, May 16, 2013 at 3:07 AM, Boris Yen <yu...@gmail.com> wrote:
> > Hi All,
> >
> > Sorry for the wide distribution.
> >
> > Our cassandra is running on 1.0.10. Recently, we are facing a weird
> > situation. We have a column family containing wide rows (each row might
> > have a few million of columns). We delete the columns on a daily basis
> and
> > we also run major compaction on it everyday to free up disk space (the
> > gc_grace is set to 600 seconds).
> >
> > However, every time we run the major compaction, only 1 or 2GB disk space
> > is freed. We tried to delete most of the data before running compaction,
> > however, the result is pretty much the same.
> >
> > So, we tried to check the source code. It seems that the column
> tombstones
> > could only be purged when the row key is not in other sstables. I know
> the
> > major compaction should include all sstables, however, in our use case,
> > columns get inserted rapidly. This will make the cassandra flush the
> > memtables to disk and create new sstables. The newly created sstables
> will
> > have the same keys as the sstables that are being compacted (the
> compaction
> > will take 2 or 3 hours to finish). My question is that will these newly
> > created sstables be the cause of why most of the column-tombstone not
> being
> > purged?
> >
> > p.s. We also did some other tests. We inserted data to the same CF with
> the
> > same wide-row pattern and deleted most of the data. This time we stopped
> > all the writes to cassandra and did the compaction. The disk usage
> > decreased dramatically.
> >
> > Any suggestions or is this a know issue.
> >
> > Thanks and Regards,
> > Boris
>
>
>
> --
> Yuki Morishita
>  t:yukim (http://twitter.com/yukim)
>

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

Posted by Yuki Morishita <mo...@gmail.com>.
You are right about the behavior of cassandra compaction.
It checks if the key exists on the other SSTable files that are not in
the compaction set.

I think https://issues.apache.org/jira/browse/CASSANDRA-4671 would
help if you upgrade to latest 1.2,
but in your version, I think the workaround is to stop write not to
flush, then compact.

On Thu, May 16, 2013 at 3:07 AM, Boris Yen <yu...@gmail.com> wrote:
> Hi All,
>
> Sorry for the wide distribution.
>
> Our cassandra is running on 1.0.10. Recently, we are facing a weird
> situation. We have a column family containing wide rows (each row might
> have a few million of columns). We delete the columns on a daily basis and
> we also run major compaction on it everyday to free up disk space (the
> gc_grace is set to 600 seconds).
>
> However, every time we run the major compaction, only 1 or 2GB disk space
> is freed. We tried to delete most of the data before running compaction,
> however, the result is pretty much the same.
>
> So, we tried to check the source code. It seems that the column tombstones
> could only be purged when the row key is not in other sstables. I know the
> major compaction should include all sstables, however, in our use case,
> columns get inserted rapidly. This will make the cassandra flush the
> memtables to disk and create new sstables. The newly created sstables will
> have the same keys as the sstables that are being compacted (the compaction
> will take 2 or 3 hours to finish). My question is that will these newly
> created sstables be the cause of why most of the column-tombstone not being
> purged?
>
> p.s. We also did some other tests. We inserted data to the same CF with the
> same wide-row pattern and deleted most of the data. This time we stopped
> all the writes to cassandra and did the compaction. The disk usage
> decreased dramatically.
>
> Any suggestions or is this a know issue.
>
> Thanks and Regards,
> Boris



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

Posted by Edward Capriolo <ed...@gmail.com>.
This makes sense. Unless you are running major compaction a delete could
only happen if the bloom filters confirmed the row was not in the sstables
not being compacted. If your rows are wide the odds are that they are in
most/all sstables and then finally removing them would be tricky.


On Thu, May 16, 2013 at 12:00 PM, Louvet, Jacques <
Jacques_Louvet@cable.comcast.com> wrote:

>  Boris,
>
>  We hit exactly the same issue, and you are correct the newly created
> SSTables are the cause of why most of the column-tombstone not being purged.
>
>  There is an improvement in 1.2 train where both the minimum and maximum
> timestamp for a row is now stored and used during the compaction to
> determine if the portion of the row can be purged.
> However, this only appears to help Major compaction has the other
> restriction where all the files encompassing the deleted rows must be part
> of the compaction for the row to be purged still remains.
>
>  We have switched to column delete rather that row delete wherever
> practical. A little more work on the app, but a big improvement in reads
> due to much more efficient compaction.
>
>  Regards,
> Jacques
>
>   From: Boris Yen <yu...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, May 16, 2013 04:07
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, "
> dev@cassandra.apache.org" <de...@cassandra.apache.org>
> Subject: Major compaction does not seems to free the disk space a lot if
> wide rows are used.
>
>      Hi All,
>
> Sorry for the wide distribution.
>
>  Our cassandra is running on 1.0.10. Recently, we are facing a weird
> situation. We have a column family containing wide rows (each row might
> have a few million of columns). We delete the columns on a daily basis and
> we also run major compaction on it everyday to free up disk space (the
> gc_grace is set to 600 seconds).
>
>  However, every time we run the major compaction, only 1 or 2GB disk space
> is freed. We tried to delete most of the data before running compaction,
> however, the result is pretty much the same.
>
>  So, we tried to check the source code. It seems that the column
> tombstones could only be purged when the row key is not in other sstables.
> I know the major compaction should include all sstables, however, in our
> use case, columns get inserted rapidly. This will make the cassandra flush
> the memtables to disk and create new sstables. The newly created sstables
> will have the same keys as the sstables that are being compacted (the
> compaction will take 2 or 3 hours to finish). My question is that will
> these newly created sstables be the cause of why most of the
> column-tombstone not being purged?
>
>  p.s. We also did some other tests. We inserted data to the same CF with
> the same wide-row pattern and deleted most of the data. This time we
> stopped all the writes to cassandra and did the compaction. The disk usage
> decreased dramatically.
>
>  Any suggestions or is this a know issue.
>
>  Thanks and Regards,
>  Boris
>

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

Posted by Edward Capriolo <ed...@gmail.com>.
This makes sense. Unless you are running major compaction a delete could
only happen if the bloom filters confirmed the row was not in the sstables
not being compacted. If your rows are wide the odds are that they are in
most/all sstables and then finally removing them would be tricky.


On Thu, May 16, 2013 at 12:00 PM, Louvet, Jacques <
Jacques_Louvet@cable.comcast.com> wrote:

>  Boris,
>
>  We hit exactly the same issue, and you are correct the newly created
> SSTables are the cause of why most of the column-tombstone not being purged.
>
>  There is an improvement in 1.2 train where both the minimum and maximum
> timestamp for a row is now stored and used during the compaction to
> determine if the portion of the row can be purged.
> However, this only appears to help Major compaction has the other
> restriction where all the files encompassing the deleted rows must be part
> of the compaction for the row to be purged still remains.
>
>  We have switched to column delete rather that row delete wherever
> practical. A little more work on the app, but a big improvement in reads
> due to much more efficient compaction.
>
>  Regards,
> Jacques
>
>   From: Boris Yen <yu...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, May 16, 2013 04:07
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, "
> dev@cassandra.apache.org" <de...@cassandra.apache.org>
> Subject: Major compaction does not seems to free the disk space a lot if
> wide rows are used.
>
>      Hi All,
>
> Sorry for the wide distribution.
>
>  Our cassandra is running on 1.0.10. Recently, we are facing a weird
> situation. We have a column family containing wide rows (each row might
> have a few million of columns). We delete the columns on a daily basis and
> we also run major compaction on it everyday to free up disk space (the
> gc_grace is set to 600 seconds).
>
>  However, every time we run the major compaction, only 1 or 2GB disk space
> is freed. We tried to delete most of the data before running compaction,
> however, the result is pretty much the same.
>
>  So, we tried to check the source code. It seems that the column
> tombstones could only be purged when the row key is not in other sstables.
> I know the major compaction should include all sstables, however, in our
> use case, columns get inserted rapidly. This will make the cassandra flush
> the memtables to disk and create new sstables. The newly created sstables
> will have the same keys as the sstables that are being compacted (the
> compaction will take 2 or 3 hours to finish). My question is that will
> these newly created sstables be the cause of why most of the
> column-tombstone not being purged?
>
>  p.s. We also did some other tests. We inserted data to the same CF with
> the same wide-row pattern and deleted most of the data. This time we
> stopped all the writes to cassandra and did the compaction. The disk usage
> decreased dramatically.
>
>  Any suggestions or is this a know issue.
>
>  Thanks and Regards,
>  Boris
>

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

Posted by "Louvet, Jacques" <Ja...@cable.comcast.com>.
Boris,

We hit exactly the same issue, and you are correct the newly created SSTables are the cause of why most of the column-tombstone not being purged.

There is an improvement in 1.2 train where both the minimum and maximum timestamp for a row is now stored and used during the compaction to determine if the portion of the row can be purged.
However, this only appears to help Major compaction has the other restriction where all the files encompassing the deleted rows must be part of the compaction for the row to be purged still remains.

We have switched to column delete rather that row delete wherever practical. A little more work on the app, but a big improvement in reads due to much more efficient compaction.

Regards,
Jacques

From: Boris Yen <yu...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, May 16, 2013 04:07
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>, "dev@cassandra.apache.org<ma...@cassandra.apache.org>" <de...@cassandra.apache.org>>
Subject: Major compaction does not seems to free the disk space a lot if wide rows are used.

Hi All,

Sorry for the wide distribution.

Our cassandra is running on 1.0.10. Recently, we are facing a weird situation. We have a column family containing wide rows (each row might have a few million of columns). We delete the columns on a daily basis and we also run major compaction on it everyday to free up disk space (the gc_grace is set to 600 seconds).

However, every time we run the major compaction, only 1 or 2GB disk space is freed. We tried to delete most of the data before running compaction, however, the result is pretty much the same.

So, we tried to check the source code. It seems that the column tombstones could only be purged when the row key is not in other sstables. I know the major compaction should include all sstables, however, in our use case, columns get inserted rapidly. This will make the cassandra flush the memtables to disk and create new sstables. The newly created sstables will have the same keys as the sstables that are being compacted (the compaction will take 2 or 3 hours to finish). My question is that will these newly created sstables be the cause of why most of the column-tombstone not being purged?

p.s. We also did some other tests. We inserted data to the same CF with the same wide-row pattern and deleted most of the data. This time we stopped all the writes to cassandra and did the compaction. The disk usage decreased dramatically.

Any suggestions or is this a know issue.

Thanks and Regards,
Boris