You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dimetrio <di...@flysoft.ru> on 2014/08/18 11:51:44 UTC

disk space and tombstones

In our Twitter-like application users have their own timelines with news from
subscriptions. To populate timelines we're using fanout on write. But we
forced to trim it to keep free disk space under control.

We use wide rows pattern and trim them with "DELETE by primary key USING
TIMESTAMP". But it seems our efforts have no effect and disk free space
decreases rapidly, even after compaction.

It is clear for us that it's not the best use case for Cassandra, but maybe
there is a way to decrease disk utilisation for this pattern?

Our cluster consists of 15 c3.4xlarge nodes with 300 GB storage. Timeline's
files take up 170 GB on each node.
gc_grace is 300,
rf=3



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: disk space and tombstones

Posted by tommaso barbugli <tb...@gmail.com>.

2014-08-18 13:25 GMT+02:00 clslrns <vi...@flysoft.ru>:

> That scheme assumes we have to read counter value before write something to
> the timeline. This is what we try to avoid as an anti-pattern.


You can work around the read counter before read, but I agree that it would
be much better if disk space was reclaimed after compaction in a more human
understandable way.


>
> By the way, is there any difference between slice trimming of one row and
> sharding pattern in terms of compaction? AFAIK, delete with timestamp by
> primary key also creates single row tombstone.
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356p7596367.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: disk space and tombstones

Posted by Vitaly Chirkov <vi...@flysoft.ru>.


DuyHai Doan wrote
> it looks like there is a need for a tool to take care of the bucketing
> switch

But I still can't understand why bucketing should be better than `DELETE row
USING TIMESTAMP`. Looks like the only source of truth about this topic is
the source code of Cassa.



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356p7596378.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: disk space and tombstones

Posted by DuyHai Doan <do...@gmail.com>.

"That scheme assumes we have to read counter value before write something to
the timeline. This is what we try to avoid as an anti-pattern."

 Hummm, it looks like there is a need for a tool to take care of the
bucketing switch. I've seen a lot of use cases where people need to do wide
row bucketing but bumps into concurrency issues for keeping the "state" of
the current partition. Managing it client-side at application level can be
a nightmare.

On Mon, Aug 18, 2014 at 1:25 PM, clslrns <vi...@flysoft.ru> wrote:

> That scheme assumes we have to read counter value before write something to
> the timeline. This is what we try to avoid as an anti-pattern.
>
> By the way, is there any difference between slice trimming of one row and
> sharding pattern in terms of compaction? AFAIK, delete with timestamp by
> primary key also creates single row tombstone.
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356p7596367.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: disk space and tombstones

Posted by clslrns <vi...@flysoft.ru>.

That scheme assumes we have to read counter value before write something to
the timeline. This is what we try to avoid as an anti-pattern.

By the way, is there any difference between slice trimming of one row and
sharding pattern in terms of compaction? AFAIK, delete with timestamp by
primary key also creates single row tombstone.



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356p7596367.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: disk space and tombstones

Posted by tommaso barbugli <tb...@gmail.com>.

what about you timeline versioning?
every time a timeline has more than x columns, you bump its version (which
should be part of its row key) and start writing on that one (though this
will make your app substantially more complex).
AFAIK reclaiming disk space for delete rows is far easier than reclaiming
it for deleted columns.


2014-08-18 12:30 GMT+02:00 Dimetrio <di...@flysoft.ru>:

> No, it is in seconds
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356p7596363.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: disk space and tombstones

Posted by Dimetrio <di...@flysoft.ru>.

No, it is in seconds



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356p7596363.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: disk space and tombstones

Posted by Dimetrio <di...@flysoft.ru>.

In our case major compaction (using sstableresetlevel) will take 15 days for
15 nodes plus trimming time. So it turns into never ending maintenance mode.



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356p7596361.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: disk space and tombstones

Posted by tommaso barbugli <tb...@gmail.com>.

I was exactly in your same situation, I could only reclaim disk space for
trimmed data this way:

very low gc_grace + size tiered compaction + slice timestamp deletes +
major compaction


2014-08-18 12:06 GMT+02:00 Rahul Neelakantan <ra...@rahul.be>:

> Is that GC_grace 300 days?
>
> Rahul Neelakantan
>
> > On Aug 18, 2014, at 5:51 AM, Dimetrio <di...@flysoft.ru> wrote:
> >
> > In our Twitter-like application users have their own timelines with news
> from
> > subscriptions. To populate timelines we're using fanout on write. But we
> > forced to trim it to keep free disk space under control.
> >
> > We use wide rows pattern and trim them with "DELETE by primary key USING
> > TIMESTAMP". But it seems our efforts have no effect and disk free space
> > decreases rapidly, even after compaction.
> >
> > It is clear for us that it's not the best use case for Cassandra, but
> maybe
> > there is a way to decrease disk utilisation for this pattern?
> >
> > Our cluster consists of 15 c3.4xlarge nodes with 300 GB storage.
> Timeline's
> > files take up 170 GB on each node.
> > gc_grace is 300,
> > rf=3
> >
> >
> >
> > --
> > View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356.html
> > Sent from the cassandra-user@incubator.apache.org mailing list archive
> at Nabble.com.
>

Re: disk space and tombstones

Posted by Rahul Neelakantan <ra...@rahul.be>.

Is that GC_grace 300 days?

Rahul Neelakantan

> On Aug 18, 2014, at 5:51 AM, Dimetrio <di...@flysoft.ru> wrote:
> 
> In our Twitter-like application users have their own timelines with news from
> subscriptions. To populate timelines we're using fanout on write. But we
> forced to trim it to keep free disk space under control.
> 
> We use wide rows pattern and trim them with "DELETE by primary key USING
> TIMESTAMP". But it seems our efforts have no effect and disk free space
> decreases rapidly, even after compaction.
> 
> It is clear for us that it's not the best use case for Cassandra, but maybe
> there is a way to decrease disk utilisation for this pattern?
> 
> Our cluster consists of 15 c3.4xlarge nodes with 300 GB storage. Timeline's
> files take up 170 GB on each node.
> gc_grace is 300,
> rf=3
> 
> 
> 
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/disk-space-and-tombstones-tp7596356.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.