You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alain RODRIGUEZ <ar...@gmail.com> on 2013/02/11 16:21:58 UTC

Deleting old items

Hi,

I would like to know if there is a way to delete old/unused data easily ?

I know about TTL but there are 2 limitations of TTL:

- AFAIK, there is no TTL on counter columns
- TTL need to be defined at write time, so it's too late for data already
inserted.

I also could use a standard "delete" but it seems inappropriate for such a
massive.

In some cases, I don't know the row key and would like to delete all the
rows starting by, let's say, "1050#..."

Even better, I understood that columns are always inserted in C* with
(name, value, timestamp). So is it possible to delete all the data inserted
in some CF between 2 dates or data older than 1 month ?

Alain

Re: Deleting old items

Posted by aaron morton <aa...@thelastpickle.com>.
I'll email the docs people. 

I believe they are saying "use compaction throttling rather than this" not "this does nothing"

Although I used this in the last month on a machine with very little ram to limit compaction memory use.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/02/2013, at 7:05 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> "Can you point to the docs."
> 
> http://www.datastax.com/docs/1.1/configuration/storage_configuration#max-compaction-threshold
> 
> And thanks about the rest of your answers, once again ;-).
> 
> Alain
> 
> 
> 2013/2/16 aaron morton <aa...@thelastpickle.com>
>>  Is that a feature that could possibly be developed one day ?
> No. 
> Timestamps are essentially internal implementation used to resolve different values for the same column. 
> 
>> With "min_compaction_level_threshold" did you mean "min_compaction_threshold"  ? If so, why should I do that, what are the advantage/inconvenient of reducing this value ?
> 
> Yes, min_compaction_threshold, my bad. 
> If you have a wide row and delete a lot of values you will end up with a lot of tombstones. These may dramatically reduce the read performance until they are purged. Reducing the compaction threshold makes compaction happen more frequently. 
> 
>> Looking at the doc I saw that: "max_compaction_threshold: Ignored in Cassandra 1.1 and later.". How to ensure that I'll always keep a small amount of SSTables then ?
> AFAIK it's not. 
> There may be some confusion about the location of the settings in CLI vs CQL. 
> Can you point to the docs. 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 13/02/2013, at 10:14 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
> 
>> Hi Aaron, once again thanks for this answer.
>>> "So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ?"
>> "No. "
>> 
>> Why is there no way of deleting or getting data using the internal timestamp stored alongside of any inserted column (as described here: http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is that a feature that could possibly be developed one day ? It could be useful to perform delete of old data or to bring to a dev cluster just the last week of data for example.
>> 
>> With "min_compaction_level_threshold" did you mean "min_compaction_threshold"  ? If so, why should I do that, what are the advantage/inconvenient of reducing this value ?
>> 
>> Looking at the doc I saw that: "max_compaction_threshold: Ignored in Cassandra 1.1 and later.". How to ensure that I'll always keep a small amount of SSTables then ? Why is this deprecated ?
>> 
>> Alain
>> 
>> 
>> 2013/2/12 aaron morton <aa...@thelastpickle.com>
>>> So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ?
>> No. 
>> 
>> You need to issue row level deletes. If you don't know the row key you'll need to do range scans to locate them. 
>> 
>> If you are deleting parts of wide rows consider reducing the min_compaction_level_threshold on the CF to 2
>> 
>> Cheers
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> I would like to know if there is a way to delete old/unused data easily ?
>>> 
>>> I know about TTL but there are 2 limitations of TTL:
>>> 
>>> - AFAIK, there is no TTL on counter columns
>>> - TTL need to be defined at write time, so it's too late for data already inserted.
>>> 
>>> I also could use a standard "delete" but it seems inappropriate for such a massive.
>>> 
>>> In some cases, I don't know the row key and would like to delete all the rows starting by, let's say, "1050#..." 
>>> 
>>> Even better, I understood that columns are always inserted in C* with (name, value, timestamp). So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ?
>>> 
>>> Alain
>> 
>> 
> 
> 


Re: Deleting old items

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
"Can you point to the docs."

http://www.datastax.com/docs/1.1/configuration/storage_configuration#max-compaction-threshold

And thanks about the rest of your answers, once again ;-).

Alain


2013/2/16 aaron morton <aa...@thelastpickle.com>

>  Is that a feature that could possibly be developed one day ?
>
> No.
> Timestamps are essentially internal implementation used to resolve
> different values for the same column.
>
> With "min_compaction_level_threshold" did you mean "
> min_compaction_threshold"  ? If so, why should I do that, what are the
> advantage/inconvenient of reducing this value ?
>
> Yes, min_compaction_threshold, my bad.
> If you have a wide row and delete a lot of values you will end up with a
> lot of tombstones. These may dramatically reduce the read performance until
> they are purged. Reducing the compaction threshold makes compaction happen
> more frequently.
>
> Looking at the doc I saw that: "max_compaction_threshold: Ignored in
> Cassandra 1.1 and later.". How to ensure that I'll always keep a small
> amount of SSTables then ?
>
> AFAIK it's not.
> There may be some confusion about the location of the settings in CLI vs
> CQL.
> Can you point to the docs.
>
> Cheers
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/02/2013, at 10:14 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>
> Hi Aaron, once again thanks for this answer.
>
> "So is it possible to delete all the data inserted in some CF between 2
> dates or data older than 1 month ?"
>
> "No. "
>
> Why is there no way of deleting or getting data using the internal
> timestamp stored alongside of any inserted column (as described here:
> http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is
> that a feature that could possibly be developed one day ? It could
> be useful to perform delete of old data or to bring to a dev cluster just
> the last week of data for example.
>
> With "min_compaction_level_threshold" did you mean "
> min_compaction_threshold"  ? If so, why should I do that, what are the
> advantage/inconvenient of reducing this value ?
>
> Looking at the doc I saw that: "max_compaction_threshold: Ignored in
> Cassandra 1.1 and later.". How to ensure that I'll always keep a small
> amount of SSTables then ? Why is this deprecated ?
>
> Alain
>
>
> 2013/2/12 aaron morton <aa...@thelastpickle.com>
>
>> So is it possible to delete all the data inserted in some CF between 2
>> dates or data older than 1 month ?
>>
>> No.
>>
>> You need to issue row level deletes. If you don't know the row key you'll
>> need to do range scans to locate them.
>>
>> If you are deleting parts of wide rows consider reducing the
>> min_compaction_level_threshold on the CF to 2
>>
>> Cheers
>>
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>>
>> Hi,
>>
>> I would like to know if there is a way to delete old/unused data easily ?
>>
>> I know about TTL but there are 2 limitations of TTL:
>>
>> - AFAIK, there is no TTL on counter columns
>> - TTL need to be defined at write time, so it's too late for data already
>> inserted.
>>
>> I also could use a standard "delete" but it seems inappropriate for such
>> a massive.
>>
>> In some cases, I don't know the row key and would like to delete all the
>> rows starting by, let's say, "1050#..."
>>
>> Even better, I understood that columns are always inserted in C* with
>> (name, value, timestamp). So is it possible to delete all the data inserted
>> in some CF between 2 dates or data older than 1 month ?
>>
>> Alain
>>
>>
>>
>
>

Re: Deleting old items

Posted by aaron morton <aa...@thelastpickle.com>.
>  Is that a feature that could possibly be developed one day ?
No. 
Timestamps are essentially internal implementation used to resolve different values for the same column. 

> With "min_compaction_level_threshold" did you mean "min_compaction_threshold"  ? If so, why should I do that, what are the advantage/inconvenient of reducing this value ?
Yes, min_compaction_threshold, my bad. 
If you have a wide row and delete a lot of values you will end up with a lot of tombstones. These may dramatically reduce the read performance until they are purged. Reducing the compaction threshold makes compaction happen more frequently. 

> Looking at the doc I saw that: "max_compaction_threshold: Ignored in Cassandra 1.1 and later.". How to ensure that I'll always keep a small amount of SSTables then ?
AFAIK it's not. 
There may be some confusion about the location of the settings in CLI vs CQL. 
Can you point to the docs. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/02/2013, at 10:14 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Hi Aaron, once again thanks for this answer.
>> "So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ?"
> "No. "
> 
> Why is there no way of deleting or getting data using the internal timestamp stored alongside of any inserted column (as described here: http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is that a feature that could possibly be developed one day ? It could be useful to perform delete of old data or to bring to a dev cluster just the last week of data for example.
> 
> With "min_compaction_level_threshold" did you mean "min_compaction_threshold"  ? If so, why should I do that, what are the advantage/inconvenient of reducing this value ?
> 
> Looking at the doc I saw that: "max_compaction_threshold: Ignored in Cassandra 1.1 and later.". How to ensure that I'll always keep a small amount of SSTables then ? Why is this deprecated ?
> 
> Alain
> 
> 
> 2013/2/12 aaron morton <aa...@thelastpickle.com>
>> So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ?
> No. 
> 
> You need to issue row level deletes. If you don't know the row key you'll need to do range scans to locate them. 
> 
> If you are deleting parts of wide rows consider reducing the min_compaction_level_threshold on the CF to 2
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I would like to know if there is a way to delete old/unused data easily ?
>> 
>> I know about TTL but there are 2 limitations of TTL:
>> 
>> - AFAIK, there is no TTL on counter columns
>> - TTL need to be defined at write time, so it's too late for data already inserted.
>> 
>> I also could use a standard "delete" but it seems inappropriate for such a massive.
>> 
>> In some cases, I don't know the row key and would like to delete all the rows starting by, let's say, "1050#..." 
>> 
>> Even better, I understood that columns are always inserted in C* with (name, value, timestamp). So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ?
>> 
>> Alain
> 
> 


Re: Deleting old items

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi Aaron, once again thanks for this answer.

"So is it possible to delete all the data inserted in some CF between 2
dates or data older than 1 month ?"

"No. "

Why is there no way of deleting or getting data using the internal
timestamp stored alongside of any inserted column (as described here:
http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is
that a feature that could possibly be developed one day ? It could
be useful to perform delete of old data or to bring to a dev cluster just
the last week of data for example.

With "min_compaction_level_threshold" did you mean "min_compaction_threshold"
 ? If so, why should I do that, what are the advantage/inconvenient of
reducing this value ?

Looking at the doc I saw that: "max_compaction_threshold: Ignored in
Cassandra 1.1 and later.". How to ensure that I'll always keep a small
amount of SSTables then ? Why is this deprecated ?

Alain


2013/2/12 aaron morton <aa...@thelastpickle.com>

> So is it possible to delete all the data inserted in some CF between 2
> dates or data older than 1 month ?
>
> No.
>
> You need to issue row level deletes. If you don't know the row key you'll
> need to do range scans to locate them.
>
> If you are deleting parts of wide rows consider reducing the
> min_compaction_level_threshold on the CF to 2
>
> Cheers
>
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>
> Hi,
>
> I would like to know if there is a way to delete old/unused data easily ?
>
> I know about TTL but there are 2 limitations of TTL:
>
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already
> inserted.
>
> I also could use a standard "delete" but it seems inappropriate for such a
> massive.
>
> In some cases, I don't know the row key and would like to delete all the
> rows starting by, let's say, "1050#..."
>
> Even better, I understood that columns are always inserted in C* with
> (name, value, timestamp). So is it possible to delete all the data inserted
> in some CF between 2 dates or data older than 1 month ?
>
> Alain
>
>
>

Re: Deleting old items

Posted by aaron morton <aa...@thelastpickle.com>.
> So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ?
No. 

You need to issue row level deletes. If you don't know the row key you'll need to do range scans to locate them. 

If you are deleting parts of wide rows consider reducing the min_compaction_level_threshold on the CF to 2

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Hi,
> 
> I would like to know if there is a way to delete old/unused data easily ?
> 
> I know about TTL but there are 2 limitations of TTL:
> 
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already inserted.
> 
> I also could use a standard "delete" but it seems inappropriate for such a massive.
> 
> In some cases, I don't know the row key and would like to delete all the rows starting by, let's say, "1050#..." 
> 
> Even better, I understood that columns are always inserted in C* with (name, value, timestamp). So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ?
> 
> Alain