You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Python_Max <py...@gmail.com> on 2018/01/10 13:59:00 UTC

Too many tombstones using TTL

Hello, C* users and experts.

I have (one more) question about tombstones.

Consider the following example:
cqlsh> create keyspace test_ttl with replication = {'class':
'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
primary key (a, b));
cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111', 'C222',
'C333') using ttl 60;
bash$ nodetool flush
bash$ sleep 60
bash$ nodetool compact test_ttl items
bash$ sstabledump mc-2-big-Data.db

[
  {
    "partition" : {
      "key" : [ "AAA" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 58,
        "clustering" : [ "BBB" ],
        "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z", "ttl" :
60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
        "cells" : [
          { "name" : "c1", "deletion_info" : { "local_delete_time" :
"2018-01-10T13:29:25Z" }
          },
          { "name" : "c2", "deletion_info" : { "local_delete_time" :
"2018-01-10T13:29:25Z" }
          },
          { "name" : "c3", "deletion_info" : { "local_delete_time" :
"2018-01-10T13:29:25Z" }
          }
        ]
      }
    ]
  }
]

The question is why Cassandra creates a tombstone for every column instead
of single tombstone per row?

In production environment I have a table with ~30 columns and It gives me a
warning for 30k tombstones and 300 live rows. It is 30 times more then it
could be.
Can this behavior be tuned in some way?

Thanks.

-- 
Best regards,
Python_Max.

Re: Too many tombstones using TTL

Posted by kurt greaves <ku...@instaclustr.com>.

You should be able to avoid querying the tombstones if it's time series
data. Using TWCS just make sure you don't query data that you know is
expired (assuming you have the time component in your clustering key).

Re: Too many tombstones using TTL

Posted by "Charulata Sharma (charshar)" <ch...@cisco.com.INVALID>.

Hi,
    I have struggled a lot with tombstones and finally learnt the following:


-          Deletes are not the only operation that cause tombstones. Check if you are inserting any nulls in any of the table columns.

If yes then if you use Prepared statements, then you can unset the null value.

-          You can forcibly force garbage collection on the specific table and this makes a huge difference.

(You can read my blog on this. I have mentioned all the steps that we carried out. )
https://medium.com/cassandra-tombstones-clearing-use-case/the-curios-case-of-tombstones-d897f681a378




Thanks,
Charu



From: Python_Max <py...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Tuesday, January 16, 2018 at 7:26 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Too many tombstones using TTL

Thanks for a very helpful reply.
Will try to refactor the code accordingly.

On Tue, Jan 16, 2018 at 4:36 PM, Alexander Dejanovski <al...@thelastpickle.com>> wrote:
I would not plan on deleting data at the row level as you'll end up with a lot of tombstones eventually (and you won't even notice them).
It's not healthy to allow that many tombstones to be read, and while your latency may fit your SLA now, it may not in the future.
Tombstones are going to create a lot of heap pressure and eventually trigger long GC pauses, which then tend to affect the whole cluster (a slow node is worse than a down node).

You should definitely separate data that is TTLed and data that is not in different tables so that you can adjust compaction strategies, gc_grace_seconds and read patterns accordingly. I understand that it will complexify your code, but it will prevent severe performance issues in Cassandra.

Tombstones won't be a problem for repair, they will get repaired as classic cells. They negatively affect the read path mostly, and use space on disk.

On Tue, Jan 16, 2018 at 2:12 PM Python_Max <py...@gmail.com>> wrote:
Hello.

I was planning to remove a row (not partition).

Most of the tombstones are seen in the use case of geographic grid with X:Y as partition key and object id (timeuuid) as clustering key where objects could be temporary with TTL about 10 hours or fully persistent.
When I select all objects in specific X:Y I can even hit 100k (default) limit for some X:Y. I have changed this limit to 500k since 99.9p read latency is < 75ms so I should not (?) care how many tombstones while read latency is fine.

Splitting entities to temporary and permanent and using different compaction strategies is an option but it will lead to code duplication and 2x read queries.

Is my assumption correct about tombstones are not so big problem as soon as read latency and disk usage are okey? Are tombstones affect repair time (using reaper)?

Thanks.


On Tue, Jan 16, 2018 at 11:32 AM, Alexander Dejanovski <al...@thelastpickle.com>> wrote:
Hi,

could you be more specific about the deletes you're planning to perform ?
This will end up moving your problem somewhere else as you'll be generating new tombstones (and if you're planning on deleting rows, be aware that row level tombstones aren't reported anywhere in the metrics, logs and query traces).
Currently you can delete your data at the partition level, which will create a single tombstone that will shadow all your expired (and non expired) data and is very efficient. The read path is optimized for such tombstones and the data won't be fully read from disk nor exchanged between replicas. But that's of course if your use case allows to delete full partitions.

We usually model so that we can restrict our reads to live data.
If you're creating time series, your clustering key should include a timestamp, which you can use to avoid reading expired data. If your TTL is set to 60 days, you can read only data that is strictly younger than that.
Then you can partition by time ranges, and access exclusively partitions that have no chance to be expired yet.
Those techniques usually work better with TWCS, but the former could make you hit a lot of SSTables if your partitions can spread over all time buckets, so only use TWCS if you can restrict individual reads to up to 4 time windows.

Cheers,


On Tue, Jan 16, 2018 at 10:01 AM Python_Max <py...@gmail.com>> wrote:
Hi.

Thank you very much for detailed explanation.
Seems that there is nothing I can do about it except delete records by key instead of expiring.


On Fri, Jan 12, 2018 at 7:30 PM, Alexander Dejanovski <al...@thelastpickle.com>> wrote:
Hi,

As DuyHai said, different TTLs could theoretically be set for different cells of the same row. And one TTLed cell could be shadowing another cell that has no TTL (say you forgot to set a TTL and set one afterwards by performing an update), or vice versa.
One cell could also be missing from a node without Cassandra knowing. So turning an incomplete row that only has expired cells into a tombstone row could lead to wrong results being returned at read time : the tombstone row could potentially shadow a valid live cell from another replica.

Cassandra needs to retain each TTLed cell and send it to replicas during reads to cover all possible cases.


On Fri, Jan 12, 2018 at 5:28 PM Python_Max <py...@gmail.com>> wrote:
Thank you for response.

I know about the option of setting TTL per column or even per item in collection. However in my example entire row has expired, shouldn't Cassandra be able to detect this situation and spawn a single tombstone for entire row instead of many?
Is there any reason not doing this except that no one needs it? Is this suitable for feature request or improvement?

Thanks.

On Wed, Jan 10, 2018 at 4:52 PM, DuyHai Doan <do...@gmail.com>> wrote:
"The question is why Cassandra creates a tombstone for every column instead of single tombstone per row?"

--> Simply because technically it is possible to set different TTL value on each column of a CQL row

On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <py...@gmail.com>> wrote:
Hello, C* users and experts.

I have (one more) question about tombstones.

Consider the following example:
cqlsh> create keyspace test_ttl with replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text, primary key (a, b));
cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111', 'C222', 'C333') using ttl 60;
bash$ nodetool flush
bash$ sleep 60
bash$ nodetool compact test_ttl items
bash$ sstabledump mc-2-big-Data.db

[
  {
    "partition" : {
      "key" : [ "AAA" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 58,
        "clustering" : [ "BBB" ],
        "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z", "ttl" : 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
        "cells" : [
          { "name" : "c1", "deletion_info" : { "local_delete_time" : "2018-01-10T13:29:25Z" }
          },
          { "name" : "c2", "deletion_info" : { "local_delete_time" : "2018-01-10T13:29:25Z" }
          },
          { "name" : "c3", "deletion_info" : { "local_delete_time" : "2018-01-10T13:29:25Z" }
          }
        ]
      }
    ]
  }
]

The question is why Cassandra creates a tombstone for every column instead of single tombstone per row?

In production environment I have a table with ~30 columns and It gives me a warning for 30k tombstones and 300 live rows. It is 30 times more then it could be.
Can this behavior be tuned in some way?

Thanks.

--
Best regards,
Python_Max.




--
Best regards,
Python_Max.

--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<http://www.thelastpickle.com/>



--
Best regards,
Python_Max.


--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<http://www.thelastpickle.com/>



--
Best regards,
Python_Max.


--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<http://www.thelastpickle.com/>



--
Best regards,
Python_Max.

Re: Too many tombstones using TTL

Posted by Python_Max <py...@gmail.com>.

Thanks for a very helpful reply.
Will try to refactor the code accordingly.

On Tue, Jan 16, 2018 at 4:36 PM, Alexander Dejanovski <
alex@thelastpickle.com> wrote:

> I would not plan on deleting data at the row level as you'll end up with a
> lot of tombstones eventually (and you won't even notice them).
> It's not healthy to allow that many tombstones to be read, and while your
> latency may fit your SLA now, it may not in the future.
> Tombstones are going to create a lot of heap pressure and eventually
> trigger long GC pauses, which then tend to affect the whole cluster (a slow
> node is worse than a down node).
>
> You should definitely separate data that is TTLed and data that is not in
> different tables so that you can adjust compaction strategies,
> gc_grace_seconds and read patterns accordingly. I understand that it will
> complexify your code, but it will prevent severe performance issues in
> Cassandra.
>
> Tombstones won't be a problem for repair, they will get repaired as
> classic cells. They negatively affect the read path mostly, and use space
> on disk.
>
> On Tue, Jan 16, 2018 at 2:12 PM Python_Max <py...@gmail.com> wrote:
>
>> Hello.
>>
>> I was planning to remove a row (not partition).
>>
>> Most of the tombstones are seen in the use case of geographic grid with
>> X:Y as partition key and object id (timeuuid) as clustering key where
>> objects could be temporary with TTL about 10 hours or fully persistent.
>> When I select all objects in specific X:Y I can even hit 100k (default)
>> limit for some X:Y. I have changed this limit to 500k since 99.9p read
>> latency is < 75ms so I should not (?) care how many tombstones while read
>> latency is fine.
>>
>> Splitting entities to temporary and permanent and using different
>> compaction strategies is an option but it will lead to code duplication and
>> 2x read queries.
>>
>> Is my assumption correct about tombstones are not so big problem as soon
>> as read latency and disk usage are okey? Are tombstones affect repair time
>> (using reaper)?
>>
>> Thanks.
>>
>>
>> On Tue, Jan 16, 2018 at 11:32 AM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>>> Hi,
>>>
>>> could you be more specific about the deletes you're planning to perform ?
>>> This will end up moving your problem somewhere else as you'll be
>>> generating new tombstones (and if you're planning on deleting rows, be
>>> aware that row level tombstones aren't reported anywhere in the metrics,
>>> logs and query traces).
>>> Currently you can delete your data at the partition level, which will
>>> create a single tombstone that will shadow all your expired (and non
>>> expired) data and is very efficient. The read path is optimized for such
>>> tombstones and the data won't be fully read from disk nor exchanged between
>>> replicas. But that's of course if your use case allows to delete full
>>> partitions.
>>>
>>> We usually model so that we can restrict our reads to live data.
>>> If you're creating time series, your clustering key should include a
>>> timestamp, which you can use to avoid reading expired data. If your TTL is
>>> set to 60 days, you can read only data that is strictly younger than that.
>>> Then you can partition by time ranges, and access exclusively partitions
>>> that have no chance to be expired yet.
>>> Those techniques usually work better with TWCS, but the former could
>>> make you hit a lot of SSTables if your partitions can spread over all time
>>> buckets, so only use TWCS if you can restrict individual reads to up to 4
>>> time windows.
>>>
>>> Cheers,
>>>
>>>
>>> On Tue, Jan 16, 2018 at 10:01 AM Python_Max <py...@gmail.com>
>>> wrote:
>>>
>>>> Hi.
>>>>
>>>> Thank you very much for detailed explanation.
>>>> Seems that there is nothing I can do about it except delete records by
>>>> key instead of expiring.
>>>>
>>>>
>>>> On Fri, Jan 12, 2018 at 7:30 PM, Alexander Dejanovski <
>>>> alex@thelastpickle.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> As DuyHai said, different TTLs could theoretically be set for
>>>>> different cells of the same row. And one TTLed cell could be shadowing
>>>>> another cell that has no TTL (say you forgot to set a TTL and set one
>>>>> afterwards by performing an update), or vice versa.
>>>>> One cell could also be missing from a node without Cassandra knowing.
>>>>> So turning an incomplete row that only has expired cells into a tombstone
>>>>> row could lead to wrong results being returned at read time : the tombstone
>>>>> row could potentially shadow a valid live cell from another replica.
>>>>>
>>>>> Cassandra needs to retain each TTLed cell and send it to replicas
>>>>> during reads to cover all possible cases.
>>>>>
>>>>>
>>>>> On Fri, Jan 12, 2018 at 5:28 PM Python_Max <py...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you for response.
>>>>>>
>>>>>> I know about the option of setting TTL per column or even per item in
>>>>>> collection. However in my example entire row has expired, shouldn't
>>>>>> Cassandra be able to detect this situation and spawn a single tombstone for
>>>>>> entire row instead of many?
>>>>>> Is there any reason not doing this except that no one needs it? Is
>>>>>> this suitable for feature request or improvement?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> On Wed, Jan 10, 2018 at 4:52 PM, DuyHai Doan <do...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> "The question is why Cassandra creates a tombstone for every column
>>>>>>> instead of single tombstone per row?"
>>>>>>>
>>>>>>> --> Simply because technically it is possible to set different TTL
>>>>>>> value on each column of a CQL row
>>>>>>>
>>>>>>> On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <py...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello, C* users and experts.
>>>>>>>>
>>>>>>>> I have (one more) question about tombstones.
>>>>>>>>
>>>>>>>> Consider the following example:
>>>>>>>> cqlsh> create keyspace test_ttl with replication = {'class':
>>>>>>>> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
>>>>>>>> cqlsh> create table items(a text, b text, c1 text, c2 text, c3
>>>>>>>> text, primary key (a, b));
>>>>>>>> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
>>>>>>>> 'C222', 'C333') using ttl 60;
>>>>>>>> bash$ nodetool flush
>>>>>>>> bash$ sleep 60
>>>>>>>> bash$ nodetool compact test_ttl items
>>>>>>>> bash$ sstabledump mc-2-big-Data.db
>>>>>>>>
>>>>>>>> [
>>>>>>>>   {
>>>>>>>>     "partition" : {
>>>>>>>>       "key" : [ "AAA" ],
>>>>>>>>       "position" : 0
>>>>>>>>     },
>>>>>>>>     "rows" : [
>>>>>>>>       {
>>>>>>>>         "type" : "row",
>>>>>>>>         "position" : 58,
>>>>>>>>         "clustering" : [ "BBB" ],
>>>>>>>>         "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z",
>>>>>>>> "ttl" : 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
>>>>>>>>         "cells" : [
>>>>>>>>           { "name" : "c1", "deletion_info" : { "local_delete_time"
>>>>>>>> : "2018-01-10T13:29:25Z" }
>>>>>>>>           },
>>>>>>>>           { "name" : "c2", "deletion_info" : { "local_delete_time"
>>>>>>>> : "2018-01-10T13:29:25Z" }
>>>>>>>>           },
>>>>>>>>           { "name" : "c3", "deletion_info" : { "local_delete_time"
>>>>>>>> : "2018-01-10T13:29:25Z" }
>>>>>>>>           }
>>>>>>>>         ]
>>>>>>>>       }
>>>>>>>>     ]
>>>>>>>>   }
>>>>>>>> ]
>>>>>>>>
>>>>>>>> The question is why Cassandra creates a tombstone for every column
>>>>>>>> instead of single tombstone per row?
>>>>>>>>
>>>>>>>> In production environment I have a table with ~30 columns and It
>>>>>>>> gives me a warning for 30k tombstones and 300 live rows. It is 30 times
>>>>>>>> more then it could be.
>>>>>>>> Can this behavior be tuned in some way?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Python_Max.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Python_Max.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -----------------
>>>>> Alexander Dejanovski
>>>>> France
>>>>> @alexanderdeja
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Python_Max.
>>>>
>>>
>>>
>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>>
>> --
>> Best regards,
>> Python_Max.
>>
>
>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Best regards,
Python_Max.

Re: Too many tombstones using TTL

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

I would not plan on deleting data at the row level as you'll end up with a
lot of tombstones eventually (and you won't even notice them).
It's not healthy to allow that many tombstones to be read, and while your
latency may fit your SLA now, it may not in the future.
Tombstones are going to create a lot of heap pressure and eventually
trigger long GC pauses, which then tend to affect the whole cluster (a slow
node is worse than a down node).

You should definitely separate data that is TTLed and data that is not in
different tables so that you can adjust compaction strategies,
gc_grace_seconds and read patterns accordingly. I understand that it will
complexify your code, but it will prevent severe performance issues in
Cassandra.

Tombstones won't be a problem for repair, they will get repaired as classic
cells. They negatively affect the read path mostly, and use space on disk.

On Tue, Jan 16, 2018 at 2:12 PM Python_Max <py...@gmail.com> wrote:

> Hello.
>
> I was planning to remove a row (not partition).
>
> Most of the tombstones are seen in the use case of geographic grid with
> X:Y as partition key and object id (timeuuid) as clustering key where
> objects could be temporary with TTL about 10 hours or fully persistent.
> When I select all objects in specific X:Y I can even hit 100k (default)
> limit for some X:Y. I have changed this limit to 500k since 99.9p read
> latency is < 75ms so I should not (?) care how many tombstones while read
> latency is fine.
>
> Splitting entities to temporary and permanent and using different
> compaction strategies is an option but it will lead to code duplication and
> 2x read queries.
>
> Is my assumption correct about tombstones are not so big problem as soon
> as read latency and disk usage are okey? Are tombstones affect repair time
> (using reaper)?
>
> Thanks.
>
>
> On Tue, Jan 16, 2018 at 11:32 AM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
>> Hi,
>>
>> could you be more specific about the deletes you're planning to perform ?
>> This will end up moving your problem somewhere else as you'll be
>> generating new tombstones (and if you're planning on deleting rows, be
>> aware that row level tombstones aren't reported anywhere in the metrics,
>> logs and query traces).
>> Currently you can delete your data at the partition level, which will
>> create a single tombstone that will shadow all your expired (and non
>> expired) data and is very efficient. The read path is optimized for such
>> tombstones and the data won't be fully read from disk nor exchanged between
>> replicas. But that's of course if your use case allows to delete full
>> partitions.
>>
>> We usually model so that we can restrict our reads to live data.
>> If you're creating time series, your clustering key should include a
>> timestamp, which you can use to avoid reading expired data. If your TTL is
>> set to 60 days, you can read only data that is strictly younger than that.
>> Then you can partition by time ranges, and access exclusively partitions
>> that have no chance to be expired yet.
>> Those techniques usually work better with TWCS, but the former could make
>> you hit a lot of SSTables if your partitions can spread over all time
>> buckets, so only use TWCS if you can restrict individual reads to up to 4
>> time windows.
>>
>> Cheers,
>>
>>
>> On Tue, Jan 16, 2018 at 10:01 AM Python_Max <py...@gmail.com> wrote:
>>
>>> Hi.
>>>
>>> Thank you very much for detailed explanation.
>>> Seems that there is nothing I can do about it except delete records by
>>> key instead of expiring.
>>>
>>>
>>> On Fri, Jan 12, 2018 at 7:30 PM, Alexander Dejanovski <
>>> alex@thelastpickle.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> As DuyHai said, different TTLs could theoretically be set for different
>>>> cells of the same row. And one TTLed cell could be shadowing another cell
>>>> that has no TTL (say you forgot to set a TTL and set one afterwards by
>>>> performing an update), or vice versa.
>>>> One cell could also be missing from a node without Cassandra knowing.
>>>> So turning an incomplete row that only has expired cells into a tombstone
>>>> row could lead to wrong results being returned at read time : the tombstone
>>>> row could potentially shadow a valid live cell from another replica.
>>>>
>>>> Cassandra needs to retain each TTLed cell and send it to replicas
>>>> during reads to cover all possible cases.
>>>>
>>>>
>>>> On Fri, Jan 12, 2018 at 5:28 PM Python_Max <py...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you for response.
>>>>>
>>>>> I know about the option of setting TTL per column or even per item in
>>>>> collection. However in my example entire row has expired, shouldn't
>>>>> Cassandra be able to detect this situation and spawn a single tombstone for
>>>>> entire row instead of many?
>>>>> Is there any reason not doing this except that no one needs it? Is
>>>>> this suitable for feature request or improvement?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> On Wed, Jan 10, 2018 at 4:52 PM, DuyHai Doan <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> "The question is why Cassandra creates a tombstone for every column
>>>>>> instead of single tombstone per row?"
>>>>>>
>>>>>> --> Simply because technically it is possible to set different TTL
>>>>>> value on each column of a CQL row
>>>>>>
>>>>>> On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <py...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello, C* users and experts.
>>>>>>>
>>>>>>> I have (one more) question about tombstones.
>>>>>>>
>>>>>>> Consider the following example:
>>>>>>> cqlsh> create keyspace test_ttl with replication = {'class':
>>>>>>> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
>>>>>>> cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
>>>>>>> primary key (a, b));
>>>>>>> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
>>>>>>> 'C222', 'C333') using ttl 60;
>>>>>>> bash$ nodetool flush
>>>>>>> bash$ sleep 60
>>>>>>> bash$ nodetool compact test_ttl items
>>>>>>> bash$ sstabledump mc-2-big-Data.db
>>>>>>>
>>>>>>> [
>>>>>>>   {
>>>>>>>     "partition" : {
>>>>>>>       "key" : [ "AAA" ],
>>>>>>>       "position" : 0
>>>>>>>     },
>>>>>>>     "rows" : [
>>>>>>>       {
>>>>>>>         "type" : "row",
>>>>>>>         "position" : 58,
>>>>>>>         "clustering" : [ "BBB" ],
>>>>>>>         "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z",
>>>>>>> "ttl" : 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
>>>>>>>         "cells" : [
>>>>>>>           { "name" : "c1", "deletion_info" : { "local_delete_time" :
>>>>>>> "2018-01-10T13:29:25Z" }
>>>>>>>           },
>>>>>>>           { "name" : "c2", "deletion_info" : { "local_delete_time" :
>>>>>>> "2018-01-10T13:29:25Z" }
>>>>>>>           },
>>>>>>>           { "name" : "c3", "deletion_info" : { "local_delete_time" :
>>>>>>> "2018-01-10T13:29:25Z" }
>>>>>>>           }
>>>>>>>         ]
>>>>>>>       }
>>>>>>>     ]
>>>>>>>   }
>>>>>>> ]
>>>>>>>
>>>>>>> The question is why Cassandra creates a tombstone for every column
>>>>>>> instead of single tombstone per row?
>>>>>>>
>>>>>>> In production environment I have a table with ~30 columns and It
>>>>>>> gives me a warning for 30k tombstones and 300 live rows. It is 30 times
>>>>>>> more then it could be.
>>>>>>> Can this behavior be tuned in some way?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Python_Max.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Python_Max.
>>>>>
>>>>
>>>>
>>>> --
>>>> -----------------
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Python_Max.
>>>
>>
>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
>
> --
> Best regards,
> Python_Max.
>


-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Too many tombstones using TTL

Posted by Python_Max <py...@gmail.com>.

Hello.

I was planning to remove a row (not partition).

Most of the tombstones are seen in the use case of geographic grid with X:Y
as partition key and object id (timeuuid) as clustering key where objects
could be temporary with TTL about 10 hours or fully persistent.
When I select all objects in specific X:Y I can even hit 100k (default)
limit for some X:Y. I have changed this limit to 500k since 99.9p read
latency is < 75ms so I should not (?) care how many tombstones while read
latency is fine.

Splitting entities to temporary and permanent and using different
compaction strategies is an option but it will lead to code duplication and
2x read queries.

Is my assumption correct about tombstones are not so big problem as soon as
read latency and disk usage are okey? Are tombstones affect repair time
(using reaper)?

Thanks.


On Tue, Jan 16, 2018 at 11:32 AM, Alexander Dejanovski <
alex@thelastpickle.com> wrote:

> Hi,
>
> could you be more specific about the deletes you're planning to perform ?
> This will end up moving your problem somewhere else as you'll be
> generating new tombstones (and if you're planning on deleting rows, be
> aware that row level tombstones aren't reported anywhere in the metrics,
> logs and query traces).
> Currently you can delete your data at the partition level, which will
> create a single tombstone that will shadow all your expired (and non
> expired) data and is very efficient. The read path is optimized for such
> tombstones and the data won't be fully read from disk nor exchanged between
> replicas. But that's of course if your use case allows to delete full
> partitions.
>
> We usually model so that we can restrict our reads to live data.
> If you're creating time series, your clustering key should include a
> timestamp, which you can use to avoid reading expired data. If your TTL is
> set to 60 days, you can read only data that is strictly younger than that.
> Then you can partition by time ranges, and access exclusively partitions
> that have no chance to be expired yet.
> Those techniques usually work better with TWCS, but the former could make
> you hit a lot of SSTables if your partitions can spread over all time
> buckets, so only use TWCS if you can restrict individual reads to up to 4
> time windows.
>
> Cheers,
>
>
> On Tue, Jan 16, 2018 at 10:01 AM Python_Max <py...@gmail.com> wrote:
>
>> Hi.
>>
>> Thank you very much for detailed explanation.
>> Seems that there is nothing I can do about it except delete records by
>> key instead of expiring.
>>
>>
>> On Fri, Jan 12, 2018 at 7:30 PM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>>> Hi,
>>>
>>> As DuyHai said, different TTLs could theoretically be set for different
>>> cells of the same row. And one TTLed cell could be shadowing another cell
>>> that has no TTL (say you forgot to set a TTL and set one afterwards by
>>> performing an update), or vice versa.
>>> One cell could also be missing from a node without Cassandra knowing. So
>>> turning an incomplete row that only has expired cells into a tombstone row
>>> could lead to wrong results being returned at read time : the tombstone row
>>> could potentially shadow a valid live cell from another replica.
>>>
>>> Cassandra needs to retain each TTLed cell and send it to replicas during
>>> reads to cover all possible cases.
>>>
>>>
>>> On Fri, Jan 12, 2018 at 5:28 PM Python_Max <py...@gmail.com> wrote:
>>>
>>>> Thank you for response.
>>>>
>>>> I know about the option of setting TTL per column or even per item in
>>>> collection. However in my example entire row has expired, shouldn't
>>>> Cassandra be able to detect this situation and spawn a single tombstone for
>>>> entire row instead of many?
>>>> Is there any reason not doing this except that no one needs it? Is this
>>>> suitable for feature request or improvement?
>>>>
>>>> Thanks.
>>>>
>>>> On Wed, Jan 10, 2018 at 4:52 PM, DuyHai Doan <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> "The question is why Cassandra creates a tombstone for every column
>>>>> instead of single tombstone per row?"
>>>>>
>>>>> --> Simply because technically it is possible to set different TTL
>>>>> value on each column of a CQL row
>>>>>
>>>>> On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <py...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello, C* users and experts.
>>>>>>
>>>>>> I have (one more) question about tombstones.
>>>>>>
>>>>>> Consider the following example:
>>>>>> cqlsh> create keyspace test_ttl with replication = {'class':
>>>>>> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
>>>>>> cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
>>>>>> primary key (a, b));
>>>>>> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
>>>>>> 'C222', 'C333') using ttl 60;
>>>>>> bash$ nodetool flush
>>>>>> bash$ sleep 60
>>>>>> bash$ nodetool compact test_ttl items
>>>>>> bash$ sstabledump mc-2-big-Data.db
>>>>>>
>>>>>> [
>>>>>>   {
>>>>>>     "partition" : {
>>>>>>       "key" : [ "AAA" ],
>>>>>>       "position" : 0
>>>>>>     },
>>>>>>     "rows" : [
>>>>>>       {
>>>>>>         "type" : "row",
>>>>>>         "position" : 58,
>>>>>>         "clustering" : [ "BBB" ],
>>>>>>         "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z",
>>>>>> "ttl" : 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
>>>>>>         "cells" : [
>>>>>>           { "name" : "c1", "deletion_info" : { "local_delete_time" :
>>>>>> "2018-01-10T13:29:25Z" }
>>>>>>           },
>>>>>>           { "name" : "c2", "deletion_info" : { "local_delete_time" :
>>>>>> "2018-01-10T13:29:25Z" }
>>>>>>           },
>>>>>>           { "name" : "c3", "deletion_info" : { "local_delete_time" :
>>>>>> "2018-01-10T13:29:25Z" }
>>>>>>           }
>>>>>>         ]
>>>>>>       }
>>>>>>     ]
>>>>>>   }
>>>>>> ]
>>>>>>
>>>>>> The question is why Cassandra creates a tombstone for every column
>>>>>> instead of single tombstone per row?
>>>>>>
>>>>>> In production environment I have a table with ~30 columns and It
>>>>>> gives me a warning for 30k tombstones and 300 live rows. It is 30 times
>>>>>> more then it could be.
>>>>>> Can this behavior be tuned in some way?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Python_Max.
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Python_Max.
>>>>
>>>
>>>
>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>>
>> --
>> Best regards,
>> Python_Max.
>>
>
>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Best regards,
Python_Max.

Re: Too many tombstones using TTL

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

Hi,

could you be more specific about the deletes you're planning to perform ?
This will end up moving your problem somewhere else as you'll be generating
new tombstones (and if you're planning on deleting rows, be aware that row
level tombstones aren't reported anywhere in the metrics, logs and query
traces).
Currently you can delete your data at the partition level, which will
create a single tombstone that will shadow all your expired (and non
expired) data and is very efficient. The read path is optimized for such
tombstones and the data won't be fully read from disk nor exchanged between
replicas. But that's of course if your use case allows to delete full
partitions.

We usually model so that we can restrict our reads to live data.
If you're creating time series, your clustering key should include a
timestamp, which you can use to avoid reading expired data. If your TTL is
set to 60 days, you can read only data that is strictly younger than that.
Then you can partition by time ranges, and access exclusively partitions
that have no chance to be expired yet.
Those techniques usually work better with TWCS, but the former could make
you hit a lot of SSTables if your partitions can spread over all time
buckets, so only use TWCS if you can restrict individual reads to up to 4
time windows.

Cheers,


On Tue, Jan 16, 2018 at 10:01 AM Python_Max <py...@gmail.com> wrote:

> Hi.
>
> Thank you very much for detailed explanation.
> Seems that there is nothing I can do about it except delete records by key
> instead of expiring.
>
>
> On Fri, Jan 12, 2018 at 7:30 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
>> Hi,
>>
>> As DuyHai said, different TTLs could theoretically be set for different
>> cells of the same row. And one TTLed cell could be shadowing another cell
>> that has no TTL (say you forgot to set a TTL and set one afterwards by
>> performing an update), or vice versa.
>> One cell could also be missing from a node without Cassandra knowing. So
>> turning an incomplete row that only has expired cells into a tombstone row
>> could lead to wrong results being returned at read time : the tombstone row
>> could potentially shadow a valid live cell from another replica.
>>
>> Cassandra needs to retain each TTLed cell and send it to replicas during
>> reads to cover all possible cases.
>>
>>
>> On Fri, Jan 12, 2018 at 5:28 PM Python_Max <py...@gmail.com> wrote:
>>
>>> Thank you for response.
>>>
>>> I know about the option of setting TTL per column or even per item in
>>> collection. However in my example entire row has expired, shouldn't
>>> Cassandra be able to detect this situation and spawn a single tombstone for
>>> entire row instead of many?
>>> Is there any reason not doing this except that no one needs it? Is this
>>> suitable for feature request or improvement?
>>>
>>> Thanks.
>>>
>>> On Wed, Jan 10, 2018 at 4:52 PM, DuyHai Doan <do...@gmail.com>
>>> wrote:
>>>
>>>> "The question is why Cassandra creates a tombstone for every column
>>>> instead of single tombstone per row?"
>>>>
>>>> --> Simply because technically it is possible to set different TTL
>>>> value on each column of a CQL row
>>>>
>>>> On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <py...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello, C* users and experts.
>>>>>
>>>>> I have (one more) question about tombstones.
>>>>>
>>>>> Consider the following example:
>>>>> cqlsh> create keyspace test_ttl with replication = {'class':
>>>>> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
>>>>> cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
>>>>> primary key (a, b));
>>>>> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
>>>>> 'C222', 'C333') using ttl 60;
>>>>> bash$ nodetool flush
>>>>> bash$ sleep 60
>>>>> bash$ nodetool compact test_ttl items
>>>>> bash$ sstabledump mc-2-big-Data.db
>>>>>
>>>>> [
>>>>>   {
>>>>>     "partition" : {
>>>>>       "key" : [ "AAA" ],
>>>>>       "position" : 0
>>>>>     },
>>>>>     "rows" : [
>>>>>       {
>>>>>         "type" : "row",
>>>>>         "position" : 58,
>>>>>         "clustering" : [ "BBB" ],
>>>>>         "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z",
>>>>> "ttl" : 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
>>>>>         "cells" : [
>>>>>           { "name" : "c1", "deletion_info" : { "local_delete_time" :
>>>>> "2018-01-10T13:29:25Z" }
>>>>>           },
>>>>>           { "name" : "c2", "deletion_info" : { "local_delete_time" :
>>>>> "2018-01-10T13:29:25Z" }
>>>>>           },
>>>>>           { "name" : "c3", "deletion_info" : { "local_delete_time" :
>>>>> "2018-01-10T13:29:25Z" }
>>>>>           }
>>>>>         ]
>>>>>       }
>>>>>     ]
>>>>>   }
>>>>> ]
>>>>>
>>>>> The question is why Cassandra creates a tombstone for every column
>>>>> instead of single tombstone per row?
>>>>>
>>>>> In production environment I have a table with ~30 columns and It gives
>>>>> me a warning for 30k tombstones and 300 live rows. It is 30 times more then
>>>>> it could be.
>>>>> Can this behavior be tuned in some way?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Python_Max.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Python_Max.
>>>
>>
>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
>
> --
> Best regards,
> Python_Max.
>


-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Too many tombstones using TTL

Posted by Python_Max <py...@gmail.com>.

Hi.

Thank you very much for detailed explanation.
Seems that there is nothing I can do about it except delete records by key
instead of expiring.


On Fri, Jan 12, 2018 at 7:30 PM, Alexander Dejanovski <
alex@thelastpickle.com> wrote:

> Hi,
>
> As DuyHai said, different TTLs could theoretically be set for different
> cells of the same row. And one TTLed cell could be shadowing another cell
> that has no TTL (say you forgot to set a TTL and set one afterwards by
> performing an update), or vice versa.
> One cell could also be missing from a node without Cassandra knowing. So
> turning an incomplete row that only has expired cells into a tombstone row
> could lead to wrong results being returned at read time : the tombstone row
> could potentially shadow a valid live cell from another replica.
>
> Cassandra needs to retain each TTLed cell and send it to replicas during
> reads to cover all possible cases.
>
>
> On Fri, Jan 12, 2018 at 5:28 PM Python_Max <py...@gmail.com> wrote:
>
>> Thank you for response.
>>
>> I know about the option of setting TTL per column or even per item in
>> collection. However in my example entire row has expired, shouldn't
>> Cassandra be able to detect this situation and spawn a single tombstone for
>> entire row instead of many?
>> Is there any reason not doing this except that no one needs it? Is this
>> suitable for feature request or improvement?
>>
>> Thanks.
>>
>> On Wed, Jan 10, 2018 at 4:52 PM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> "The question is why Cassandra creates a tombstone for every column
>>> instead of single tombstone per row?"
>>>
>>> --> Simply because technically it is possible to set different TTL value
>>> on each column of a CQL row
>>>
>>> On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <py...@gmail.com>
>>> wrote:
>>>
>>>> Hello, C* users and experts.
>>>>
>>>> I have (one more) question about tombstones.
>>>>
>>>> Consider the following example:
>>>> cqlsh> create keyspace test_ttl with replication = {'class':
>>>> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
>>>> cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
>>>> primary key (a, b));
>>>> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
>>>> 'C222', 'C333') using ttl 60;
>>>> bash$ nodetool flush
>>>> bash$ sleep 60
>>>> bash$ nodetool compact test_ttl items
>>>> bash$ sstabledump mc-2-big-Data.db
>>>>
>>>> [
>>>>   {
>>>>     "partition" : {
>>>>       "key" : [ "AAA" ],
>>>>       "position" : 0
>>>>     },
>>>>     "rows" : [
>>>>       {
>>>>         "type" : "row",
>>>>         "position" : 58,
>>>>         "clustering" : [ "BBB" ],
>>>>         "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z",
>>>> "ttl" : 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
>>>>         "cells" : [
>>>>           { "name" : "c1", "deletion_info" : { "local_delete_time" :
>>>> "2018-01-10T13:29:25Z" }
>>>>           },
>>>>           { "name" : "c2", "deletion_info" : { "local_delete_time" :
>>>> "2018-01-10T13:29:25Z" }
>>>>           },
>>>>           { "name" : "c3", "deletion_info" : { "local_delete_time" :
>>>> "2018-01-10T13:29:25Z" }
>>>>           }
>>>>         ]
>>>>       }
>>>>     ]
>>>>   }
>>>> ]
>>>>
>>>> The question is why Cassandra creates a tombstone for every column
>>>> instead of single tombstone per row?
>>>>
>>>> In production environment I have a table with ~30 columns and It gives
>>>> me a warning for 30k tombstones and 300 live rows. It is 30 times more then
>>>> it could be.
>>>> Can this behavior be tuned in some way?
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> Best regards,
>>>> Python_Max.
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Python_Max.
>>
>
>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Best regards,
Python_Max.

Re: Too many tombstones using TTL

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

Hi,

As DuyHai said, different TTLs could theoretically be set for different
cells of the same row. And one TTLed cell could be shadowing another cell
that has no TTL (say you forgot to set a TTL and set one afterwards by
performing an update), or vice versa.
One cell could also be missing from a node without Cassandra knowing. So
turning an incomplete row that only has expired cells into a tombstone row
could lead to wrong results being returned at read time : the tombstone row
could potentially shadow a valid live cell from another replica.

Cassandra needs to retain each TTLed cell and send it to replicas during
reads to cover all possible cases.


On Fri, Jan 12, 2018 at 5:28 PM Python_Max <py...@gmail.com> wrote:

> Thank you for response.
>
> I know about the option of setting TTL per column or even per item in
> collection. However in my example entire row has expired, shouldn't
> Cassandra be able to detect this situation and spawn a single tombstone for
> entire row instead of many?
> Is there any reason not doing this except that no one needs it? Is this
> suitable for feature request or improvement?
>
> Thanks.
>
> On Wed, Jan 10, 2018 at 4:52 PM, DuyHai Doan <do...@gmail.com> wrote:
>
>> "The question is why Cassandra creates a tombstone for every column
>> instead of single tombstone per row?"
>>
>> --> Simply because technically it is possible to set different TTL value
>> on each column of a CQL row
>>
>> On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <py...@gmail.com> wrote:
>>
>>> Hello, C* users and experts.
>>>
>>> I have (one more) question about tombstones.
>>>
>>> Consider the following example:
>>> cqlsh> create keyspace test_ttl with replication = {'class':
>>> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
>>> cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
>>> primary key (a, b));
>>> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
>>> 'C222', 'C333') using ttl 60;
>>> bash$ nodetool flush
>>> bash$ sleep 60
>>> bash$ nodetool compact test_ttl items
>>> bash$ sstabledump mc-2-big-Data.db
>>>
>>> [
>>>   {
>>>     "partition" : {
>>>       "key" : [ "AAA" ],
>>>       "position" : 0
>>>     },
>>>     "rows" : [
>>>       {
>>>         "type" : "row",
>>>         "position" : 58,
>>>         "clustering" : [ "BBB" ],
>>>         "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z", "ttl"
>>> : 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
>>>         "cells" : [
>>>           { "name" : "c1", "deletion_info" : { "local_delete_time" :
>>> "2018-01-10T13:29:25Z" }
>>>           },
>>>           { "name" : "c2", "deletion_info" : { "local_delete_time" :
>>> "2018-01-10T13:29:25Z" }
>>>           },
>>>           { "name" : "c3", "deletion_info" : { "local_delete_time" :
>>> "2018-01-10T13:29:25Z" }
>>>           }
>>>         ]
>>>       }
>>>     ]
>>>   }
>>> ]
>>>
>>> The question is why Cassandra creates a tombstone for every column
>>> instead of single tombstone per row?
>>>
>>> In production environment I have a table with ~30 columns and It gives
>>> me a warning for 30k tombstones and 300 live rows. It is 30 times more then
>>> it could be.
>>> Can this behavior be tuned in some way?
>>>
>>> Thanks.
>>>
>>> --
>>> Best regards,
>>> Python_Max.
>>>
>>
>>
>
>
> --
> Best regards,
> Python_Max.
>


-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Too many tombstones using TTL

Posted by Python_Max <py...@gmail.com>.

Thank you for response.

I know about the option of setting TTL per column or even per item in
collection. However in my example entire row has expired, shouldn't
Cassandra be able to detect this situation and spawn a single tombstone for
entire row instead of many?
Is there any reason not doing this except that no one needs it? Is this
suitable for feature request or improvement?

Thanks.

On Wed, Jan 10, 2018 at 4:52 PM, DuyHai Doan <do...@gmail.com> wrote:

> "The question is why Cassandra creates a tombstone for every column
> instead of single tombstone per row?"
>
> --> Simply because technically it is possible to set different TTL value
> on each column of a CQL row
>
> On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <py...@gmail.com> wrote:
>
>> Hello, C* users and experts.
>>
>> I have (one more) question about tombstones.
>>
>> Consider the following example:
>> cqlsh> create keyspace test_ttl with replication = {'class':
>> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
>> cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
>> primary key (a, b));
>> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
>> 'C222', 'C333') using ttl 60;
>> bash$ nodetool flush
>> bash$ sleep 60
>> bash$ nodetool compact test_ttl items
>> bash$ sstabledump mc-2-big-Data.db
>>
>> [
>>   {
>>     "partition" : {
>>       "key" : [ "AAA" ],
>>       "position" : 0
>>     },
>>     "rows" : [
>>       {
>>         "type" : "row",
>>         "position" : 58,
>>         "clustering" : [ "BBB" ],
>>         "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z", "ttl"
>> : 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
>>         "cells" : [
>>           { "name" : "c1", "deletion_info" : { "local_delete_time" :
>> "2018-01-10T13:29:25Z" }
>>           },
>>           { "name" : "c2", "deletion_info" : { "local_delete_time" :
>> "2018-01-10T13:29:25Z" }
>>           },
>>           { "name" : "c3", "deletion_info" : { "local_delete_time" :
>> "2018-01-10T13:29:25Z" }
>>           }
>>         ]
>>       }
>>     ]
>>   }
>> ]
>>
>> The question is why Cassandra creates a tombstone for every column
>> instead of single tombstone per row?
>>
>> In production environment I have a table with ~30 columns and It gives me
>> a warning for 30k tombstones and 300 live rows. It is 30 times more then it
>> could be.
>> Can this behavior be tuned in some way?
>>
>> Thanks.
>>
>> --
>> Best regards,
>> Python_Max.
>>
>
>


-- 
Best regards,
Python_Max.

Re: Too many tombstones using TTL

Posted by DuyHai Doan <do...@gmail.com>.

"The question is why Cassandra creates a tombstone for every column instead
of single tombstone per row?"

--> Simply because technically it is possible to set different TTL value on
each column of a CQL row

On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <py...@gmail.com> wrote:

> Hello, C* users and experts.
>
> I have (one more) question about tombstones.
>
> Consider the following example:
> cqlsh> create keyspace test_ttl with replication = {'class':
> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
> cqlsh> create table items(a text, b text, c1 text, c2 text, c3 text,
> primary key (a, b));
> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
> 'C222', 'C333') using ttl 60;
> bash$ nodetool flush
> bash$ sleep 60
> bash$ nodetool compact test_ttl items
> bash$ sstabledump mc-2-big-Data.db
>
> [
>   {
>     "partition" : {
>       "key" : [ "AAA" ],
>       "position" : 0
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 58,
>         "clustering" : [ "BBB" ],
>         "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z", "ttl" :
> 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
>         "cells" : [
>           { "name" : "c1", "deletion_info" : { "local_delete_time" :
> "2018-01-10T13:29:25Z" }
>           },
>           { "name" : "c2", "deletion_info" : { "local_delete_time" :
> "2018-01-10T13:29:25Z" }
>           },
>           { "name" : "c3", "deletion_info" : { "local_delete_time" :
> "2018-01-10T13:29:25Z" }
>           }
>         ]
>       }
>     ]
>   }
> ]
>
> The question is why Cassandra creates a tombstone for every column instead
> of single tombstone per row?
>
> In production environment I have a table with ~30 columns and It gives me
> a warning for 30k tombstones and 300 live rows. It is 30 times more then it
> could be.
> Can this behavior be tuned in some way?
>
> Thanks.
>
> --
> Best regards,
> Python_Max.
>