You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Brian Tarbox <ta...@cabotresearch.com> on 2014/06/18 19:56:35 UTC

can I kill very old data files in my data folder (I know that sounds crazy but....)

I have a column family that only stores the last 5 days worth of some
data...and yet I have files in the data directory for this CF that are 3
weeks old.  They take the form:

keyspace-CFName-ic-nnnn-Filter.db
keyspace-CFName-ic-nnnn-Index.db
keyspace-CFName-ic-nnnn-Data.db
keyspace-CFName-ic-nnnn-Statistics.db
keyspace-CFName-ic-nnnn-TOC.txt
keyspace-CFName-ic-nnnn-Summary.db

I have six bunches of these file groups, each with a different nnnn
value...and with timestamps of each of the last five days...plus one group
from 3 weeks ago...which makes me wonder if that group  somehow should have
been deleted but were not.

The files are tens or hundreds of gigs so deleting would be good, unless
its really bad!

Thanks,

Brian Tarbox

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

Posted by Jens Rantil <je...@tink.se>.
...and temporarily adding more nodes and rebalancing is not an option?—
Sent from Mailbox

On Wed, Jun 18, 2014 at 9:39 PM, Brian Tarbox <ta...@cabotresearch.com>
wrote:

> I don't think I have the space to run a major compaction right now (I'm
> above 50% disk space used already) and compaction can take extra space I
> think?
> On Wed, Jun 18, 2014 at 3:24 PM, Robert Coli <rc...@eventbrite.com> wrote:
>> On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox <ta...@cabotresearch.com>
>> wrote:
>>
>>> Thank you!   We are not using TTL, we're manually deleting data more than
>>> 5 days old for this CF.  We're running 1.2.13 and are using size tiered
>>> compaction (this cf is append-only i.e.zero updates).
>>>
>>> Sounds like we can get away with doing a (stop, delete old-data-file,
>>> restart) process on a rolling basis if I understand you.
>>>
>>
>> Sure, though in your case (because you're using STS and can) I'd probably
>> just run a major compaction.
>>
>> =Rob
>>
>>

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

Posted by Brian Tarbox <ta...@cabotresearch.com>.
I don't think I have the space to run a major compaction right now (I'm
above 50% disk space used already) and compaction can take extra space I
think?


On Wed, Jun 18, 2014 at 3:24 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox <ta...@cabotresearch.com>
> wrote:
>
>> Thank you!   We are not using TTL, we're manually deleting data more than
>> 5 days old for this CF.  We're running 1.2.13 and are using size tiered
>> compaction (this cf is append-only i.e.zero updates).
>>
>> Sounds like we can get away with doing a (stop, delete old-data-file,
>> restart) process on a rolling basis if I understand you.
>>
>
> Sure, though in your case (because you're using STS and can) I'd probably
> just run a major compaction.
>
> =Rob
>
>

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox <ta...@cabotresearch.com>
wrote:

> Thank you!   We are not using TTL, we're manually deleting data more than
> 5 days old for this CF.  We're running 1.2.13 and are using size tiered
> compaction (this cf is append-only i.e.zero updates).
>
> Sounds like we can get away with doing a (stop, delete old-data-file,
> restart) process on a rolling basis if I understand you.
>

Sure, though in your case (because you're using STS and can) I'd probably
just run a major compaction.

=Rob

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

Posted by Brian Tarbox <ta...@cabotresearch.com>.
Rob,
Thank you!   We are not using TTL, we're manually deleting data more than 5
days old for this CF.  We're running 1.2.13 and are using size tiered
compaction (this cf is append-only i.e.zero updates).

Sounds like we can get away with doing a (stop, delete old-data-file,
restart) process on a rolling basis if I understand you.

Thanks,

Brian


On Wed, Jun 18, 2014 at 2:37 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox <ta...@cabotresearch.com>
> wrote:
>
>> I have a column family that only stores the last 5 days worth of some
>> data...and yet I have files in the data directory for this CF that are 3
>> weeks old.
>>
>
> Are you using TTL? If so :
>
> https://issues.apache.org/jira/browse/CASSANDRA-6654
>
> Are you using size tiered or level compaction?
>
> I have six bunches of these file groups, each with a different nnnn
>> value...and with timestamps of each of the last five days...plus one group
>> from 3 weeks ago...which makes me wonder if that group  somehow should have
>> been deleted but were not.
>>
>> The files are tens or hundreds of gigs so deleting would be good, unless
>> its really bad!
>>
>
> Data files can't be deleted from the data dir with Cassandra running, but
> it should be fine (if probably technically unsupported) to delete them with
> Cassandra stopped. In most cases you don't want to do so, because you might
> un-mask deleted rows or cause unexpected consistency characteristics.
>
> In your case, you know that no data in files created 3 weeks old can
> possibly have any value, so it is safe to delete them.
>
> =Rob
>
>

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox <ta...@cabotresearch.com>
wrote:

> I have a column family that only stores the last 5 days worth of some
> data...and yet I have files in the data directory for this CF that are 3
> weeks old.
>

Are you using TTL? If so :

https://issues.apache.org/jira/browse/CASSANDRA-6654

Are you using size tiered or level compaction?

I have six bunches of these file groups, each with a different nnnn
> value...and with timestamps of each of the last five days...plus one group
> from 3 weeks ago...which makes me wonder if that group  somehow should have
> been deleted but were not.
>
> The files are tens or hundreds of gigs so deleting would be good, unless
> its really bad!
>

Data files can't be deleted from the data dir with Cassandra running, but
it should be fine (if probably technically unsupported) to delete them with
Cassandra stopped. In most cases you don't want to do so, because you might
un-mask deleted rows or cause unexpected consistency characteristics.

In your case, you know that no data in files created 3 weeks old can
possibly have any value, so it is safe to delete them.

=Rob