You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Abhishek Singh <ab...@gmail.com> on 2018/06/19 10:28:46 UTC

Tombstone

Hi all,
           We using Cassandra for storing events which are time series
based for batch processing once a particular batch based on hour is
processed we delete the entries but we were left with almost 18% deletes
marked as Tombstones.
                 I ran compaction on the particular CF tombstone didn't
come down.
            Can anyone suggest what is the optimal tunning/recommended
practice used for compaction strategy and GC_grace period with 100k entries
and deletes every hour.

Warm Regards
Abhishek Singh

Re: Tombstone

Posted by Evelyn Smith <u5...@gmail.com>.
TimeWindowCompactionStrategy and don’t delete the data you should be relying on Cassandra to drop the SSTables once the data inside has expired.

THat 18% is probably waiting on gc_grace, this shouldn’t be an issue if you are letting TWCS drop the data rather then running deletes.

Regards,
Evelyn.

> On 19 Jun 2018, at 8:28 pm, Abhishek Singh <ab...@gmail.com> wrote:
> 
> Hi all,
>            We using Cassandra for storing events which are time series based for batch processing once a particular batch based on hour is processed we delete the entries but we were left with almost 18% deletes marked as Tombstones.
>                  I ran compaction on the particular CF tombstone didn't come down.
>             Can anyone suggest what is the optimal tunning/recommended practice used for compaction strategy and GC_grace period with 100k entries and deletes every hour.
> 
> Warm Regards
> Abhishek Singh


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


RE: [EXTERNAL] Re: Tombstone

Posted by Rahul Singh <ra...@gmail.com>.
Queues can be implemented in Cassandra even though everyone believes its an “anti-pattern” if the design is designed for Cassandra’s model.

In this case, I would do a logical / soft delete on the data to invalidate it from a query that accesses it and put a TTL on the data so it deletes automatically later. You could have a default TTL or set a TTL on on your actual “delete” which would put the delete in the future for example 3 days from now.

Some sources of inspiration on how people have been doing queues on Cassandra

cherami by Uber
CMB by Comcast
cassieq — don’t remember.



--
Rahul Singh
rahul.singh@anant.us

Anant Corporation
On Jun 19, 2018, 12:39 PM -0400, Durity, Sean R <SE...@homedepot.com>, wrote:
> This sounds like a queue pattern, which is typically an anti-pattern for Cassandra. I would say that it is very difficult to get the access patterns, tombstones, and everything else lined up properly to solve a queue problem.
>
>
> Sean Durity
>
> From: Abhishek Singh <ab...@gmail.com>
> Sent: Tuesday, June 19, 2018 10:41 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: Tombstone
>
>                        The Partition key is made of datetime(basically date truncated to hour) and bucket.I think your RCA may be correct since we are deleting the partition rows one by one not in a batch files maybe overlapping for the particular partition.A scheduled thread picks the rows for a partition based on current datetime and bucket number and checks whether for each row the entiry is past due or not, if yes we trigger a event and remove the entry.
>
>
>
> On Tue 19 Jun, 2018, 7:58 PM Jeff Jirsa, <jj...@gmail.com> wrote:
> > The most likely explanation is tombstones in files that won’t be collected as they potentially overlap data in other files with a lower timestamp (especially true if your partition key doesn’t change and you’re writing and deleting data within a partition)
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Jun 19, 2018, at 3:28 AM, Abhishek Singh <ab...@gmail.com> wrote:
> > >
> > > Hi all,
> > >            We using Cassandra for storing events which are time series based for batch processing once a particular batch based on hour is processed we delete the entries but we were left with almost 18% deletes marked as Tombstones.
> > >                  I ran compaction on the particular CF tombstone didn't come down.
> > >             Can anyone suggest what is the optimal tunning/recommended practice used for compaction strategy and GC_grace period with 100k entries and deletes every hour.
> > >
> > > Warm Regards
> > > Abhishek Singh
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: user-help@cassandra.apache.org

RE: [EXTERNAL] Re: Tombstone

Posted by "Durity, Sean R" <SE...@homedepot.com>.
This sounds like a queue pattern, which is typically an anti-pattern for Cassandra. I would say that it is very difficult to get the access patterns, tombstones, and everything else lined up properly to solve a queue problem.


Sean Durity

From: Abhishek Singh <ab...@gmail.com>
Sent: Tuesday, June 19, 2018 10:41 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Tombstone

                       The Partition key is made of datetime(basically date truncated to hour) and bucket.I think your RCA may be correct since we are deleting the partition rows one by one not in a batch files maybe overlapping for the particular partition.A scheduled thread picks the rows for a partition based on current datetime and bucket number and checks whether for each row the entiry is past due or not, if yes we trigger a event and remove the entry.



On Tue 19 Jun, 2018, 7:58 PM Jeff Jirsa, <jj...@gmail.com>> wrote:
The most likely explanation is tombstones in files that won’t be collected as they potentially overlap data in other files with a lower timestamp (especially true if your partition key doesn’t change and you’re writing and deleting data within a partition)

--
Jeff Jirsa


> On Jun 19, 2018, at 3:28 AM, Abhishek Singh <ab...@gmail.com>> wrote:
>
> Hi all,
>            We using Cassandra for storing events which are time series based for batch processing once a particular batch based on hour is processed we delete the entries but we were left with almost 18% deletes marked as Tombstones.
>                  I ran compaction on the particular CF tombstone didn't come down.
>             Can anyone suggest what is the optimal tunning/recommended practice used for compaction strategy and GC_grace period with 100k entries and deletes every hour.
>
> Warm Regards
> Abhishek Singh

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org<ma...@cassandra.apache.org>
For additional commands, e-mail: user-help@cassandra.apache.org<ma...@cassandra.apache.org>

Re: Tombstone

Posted by Abhishek Singh <ab...@gmail.com>.
                       The Partition key is made of datetime(basically date
truncated to hour) and bucket.I think your RCA may be correct since we are
deleting the partition rows one by one not in a batch files maybe
overlapping for the particular partition.A scheduled thread picks the rows
for a partition based on current datetime and bucket number and checks
whether for each row the entiry is past due or not, if yes we trigger a
event and remove the entry.



On Tue 19 Jun, 2018, 7:58 PM Jeff Jirsa, <jj...@gmail.com> wrote:

> The most likely explanation is tombstones in files that won’t be collected
> as they potentially overlap data in other files with a lower timestamp
> (especially true if your partition key doesn’t change and you’re writing
> and deleting data within a partition)
>
> --
> Jeff Jirsa
>
>
> > On Jun 19, 2018, at 3:28 AM, Abhishek Singh <ab...@gmail.com> wrote:
> >
> > Hi all,
> >            We using Cassandra for storing events which are time series
> based for batch processing once a particular batch based on hour is
> processed we delete the entries but we were left with almost 18% deletes
> marked as Tombstones.
> >                  I ran compaction on the particular CF tombstone didn't
> come down.
> >             Can anyone suggest what is the optimal tunning/recommended
> practice used for compaction strategy and GC_grace period with 100k entries
> and deletes every hour.
> >
> > Warm Regards
> > Abhishek Singh
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Tombstone

Posted by Jeff Jirsa <jj...@gmail.com>.
The most likely explanation is tombstones in files that won’t be collected as they potentially overlap data in other files with a lower timestamp (especially true if your partition key doesn’t change and you’re writing and deleting data within a partition)

-- 
Jeff Jirsa


> On Jun 19, 2018, at 3:28 AM, Abhishek Singh <ab...@gmail.com> wrote:
> 
> Hi all,
>            We using Cassandra for storing events which are time series based for batch processing once a particular batch based on hour is processed we delete the entries but we were left with almost 18% deletes marked as Tombstones.
>                  I ran compaction on the particular CF tombstone didn't come down.
>             Can anyone suggest what is the optimal tunning/recommended practice used for compaction strategy and GC_grace period with 100k entries and deletes every hour.
> 
> Warm Regards
> Abhishek Singh

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org