You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ali Akhtar <al...@gmail.com> on 2016/11/08 10:04:12 UTC

Improving performance where a lot of updates and deletes are required?

I have a use case where a lot of updates and deletes to a table will be
necessary.

The deletes will be done at a scheduled time, probably at the end of the
day, each day.

Updates will be done throughout the day, as new data comes in.

Are there any guidelines on improving cassandra's performance for this use
case? Any caveats to be aware of? Any tips, like running nodetool repair
every X days?

Thanks.

Re: Improving performance where a lot of updates and deletes are required?

Posted by Hannu Kröger <hk...@gmail.com>.
Also in they are being read before compaction:
http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html>

Hannu

> On 8 Nov 2016, at 16.36, DuyHai Doan <do...@gmail.com> wrote:
> 
> "Does TTL also cause tombstones?" --> Yes, after the TTL expires, at the next compaction the TTLed column is replaced by a tombstone, as per my understanding
> 
> On Tue, Nov 8, 2016 at 3:32 PM, Ali Akhtar <ali.rac200@gmail.com <ma...@gmail.com>> wrote:
> Does TTL also cause tombstones?
> 
> On Tue, Nov 8, 2016 at 6:57 PM, Vladimir Yudovin <vladyu@winguzone.com <ma...@winguzone.com>> wrote:
> >The deletes will be done at a scheduled time, probably at the end of the day, each day.
> 
> Probably you can use TTL? http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html>
> 
> Best regards, Vladimir Yudovin, 
> Winguzone <https://winguzone.com/?from=list> - Hosted Cloud Cassandra
> Launch your cluster in minutes.
> 
> 
> ---- On Tue, 08 Nov 2016 05:04:12 -0500Ali Akhtar <ali.rac200@gmail.com <ma...@gmail.com>> wrote ----
> 
> I have a use case where a lot of updates and deletes to a table will be necessary.
> 
> The deletes will be done at a scheduled time, probably at the end of the day, each day.
> 
> Updates will be done throughout the day, as new data comes in.
> 
> Are there any guidelines on improving cassandra's performance for this use case? Any caveats to be aware of? Any tips, like running nodetool repair every X days?
> 
> Thanks.
> 
> 
> 


Re: Improving performance where a lot of updates and deletes are required?

Posted by DuyHai Doan <do...@gmail.com>.
"Does TTL also cause tombstones?" --> Yes, after the TTL expires, at the
next compaction the TTLed column is replaced by a tombstone, as per my
understanding

On Tue, Nov 8, 2016 at 3:32 PM, Ali Akhtar <al...@gmail.com> wrote:

> Does TTL also cause tombstones?
>
> On Tue, Nov 8, 2016 at 6:57 PM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
>> >The deletes will be done at a scheduled time, probably at the end of the
>> day, each day.
>>
>> Probably you can use TTL? http://docs.datastax.com/en/cq
>> l/3.1/cql/cql_using/use_expire_c.html
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Tue, 08 Nov 2016 05:04:12 -0500*Ali Akhtar <ali.rac200@gmail.com
>> <al...@gmail.com>>* wrote ----
>>
>> I have a use case where a lot of updates and deletes to a table will be
>> necessary.
>>
>> The deletes will be done at a scheduled time, probably at the end of the
>> day, each day.
>>
>> Updates will be done throughout the day, as new data comes in.
>>
>> Are there any guidelines on improving cassandra's performance for this
>> use case? Any caveats to be aware of? Any tips, like running nodetool
>> repair every X days?
>>
>> Thanks.
>>
>>
>>
>

Re: Improving performance where a lot of updates and deletes are required?

Posted by Vladimir Yudovin <vl...@winguzone.com>.
Yes, as doc says "Expired data is marked with a tombstone" but you save communication with host and processing of DELETE operator.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





---- On Tue, 08 Nov 2016 09:32:16 -0500Ali Akhtar &lt;ali.rac200@gmail.com&gt; wrote ----




Does TTL also cause tombstones?



On Tue, Nov 8, 2016 at 6:57 PM, Vladimir Yudovin &lt;vladyu@winguzone.com&gt; wrote:








&gt;The deletes will be done at a scheduled time, probably at the end of the day, each day.





Probably you can use TTL? http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





---- On Tue, 08 Nov 2016 05:04:12 -0500Ali Akhtar &lt;ali.rac200@gmail.com&gt; wrote ----




I have a use case where a lot of updates and deletes to a table will be necessary.



The deletes will be done at a scheduled time, probably at the end of the day, each day.



Updates will be done throughout the day, as new data comes in.



Are there any guidelines on improving cassandra's performance for this use case? Any caveats to be aware of? Any tips, like running nodetool repair every X days?




Thanks.















Re: Improving performance where a lot of updates and deletes are required?

Posted by Ali Akhtar <al...@gmail.com>.
Does TTL also cause tombstones?

On Tue, Nov 8, 2016 at 6:57 PM, Vladimir Yudovin <vl...@winguzone.com>
wrote:

> >The deletes will be done at a scheduled time, probably at the end of the
> day, each day.
>
> Probably you can use TTL? http://docs.datastax.com/en/
> cql/3.1/cql/cql_using/use_expire_c.html
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Tue, 08 Nov 2016 05:04:12 -0500*Ali Akhtar <ali.rac200@gmail.com
> <al...@gmail.com>>* wrote ----
>
> I have a use case where a lot of updates and deletes to a table will be
> necessary.
>
> The deletes will be done at a scheduled time, probably at the end of the
> day, each day.
>
> Updates will be done throughout the day, as new data comes in.
>
> Are there any guidelines on improving cassandra's performance for this use
> case? Any caveats to be aware of? Any tips, like running nodetool repair
> every X days?
>
> Thanks.
>
>
>

Re: Improving performance where a lot of updates and deletes are required?

Posted by Vladimir Yudovin <vl...@winguzone.com>.
&gt;The deletes will be done at a scheduled time, probably at the end of the day, each day.



Probably you can use TTL? http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





---- On Tue, 08 Nov 2016 05:04:12 -0500Ali Akhtar &lt;ali.rac200@gmail.com&gt; wrote ----




I have a use case where a lot of updates and deletes to a table will be necessary.



The deletes will be done at a scheduled time, probably at the end of the day, each day.



Updates will be done throughout the day, as new data comes in.



Are there any guidelines on improving cassandra's performance for this use case? Any caveats to be aware of? Any tips, like running nodetool repair every X days?




Thanks.








Re: Improving performance where a lot of updates and deletes are required?

Posted by Alain Rastoul <al...@gmail.com>.
On 11/08/2016 08:52 PM, Alain Rastoul wrote:
> For example if you had to track the position of a lot of objects,
> instead of updating the object records, each second you could insert a
> new event with : (object: object_id, event_type: position_move, position
> : x, y ).
>

and add a timestamp of course
and eventually TTL the data, with a decreasing clustering sort order


-- 
best,
Alain

Re: Improving performance where a lot of updates and deletes are required?

Posted by Alain Rastoul <al...@gmail.com>.
On 11/08/2016 11:05 AM, DuyHai Doan wrote:
> Are you sure Cassandra is a good fit for this kind of heavy update &
> delete scenario ?
>
+1
this sounds like relational thinking scenario... (no offense, I like 
relational systems)
As if you want to maintain the state of a lot of entities with updates & 
deletes, and you have a lot of state changes for your entities.

May be an eventstore/DDD approach would be a better model for that?

You could have an aggregate for each entity (ie. a record) you have in 
your system and insert a new event record on each update of this agregate.

For example if you had to track the position of a lot of objects, 
instead of updating the object records, each second you could insert a 
new event with : (object: object_id, event_type: position_move, position 
: x, y ).

Just a suggestion.

-- 
best,
Alain

Re: Improving performance where a lot of updates and deletes are required?

Posted by Ali Akhtar <al...@gmail.com>.
Yes, because there will also be a lot of inserts, and the linear
scalability that c* offers is required.

But the inserts aren't static, and the data that comes in will need to be
updated in response to user events.

Data which hasn't been touched for over a week has to be deleted.
(Sensitive data, so better to delete when its out of date rather than store
it).

Couldn't really do the weekly tables without massively complicating my
report generation, as the entire dataset needs to be queried for generating
certain reports.

So my question is really about how to get the best out of c* in this sort
of scenario.

On Tue, Nov 8, 2016 at 3:05 PM, DuyHai Doan <do...@gmail.com> wrote:

> Are you sure Cassandra is a good fit for this kind of heavy update &
> delete scenario ?
>
> Otherwise, you can always use several tables (one table/day, rotating
> through 7 days for a week) and do a truncate of the table at the end of the
> day.
>
> On Tue, Nov 8, 2016 at 11:04 AM, Ali Akhtar <al...@gmail.com> wrote:
>
>> I have a use case where a lot of updates and deletes to a table will be
>> necessary.
>>
>> The deletes will be done at a scheduled time, probably at the end of the
>> day, each day.
>>
>> Updates will be done throughout the day, as new data comes in.
>>
>> Are there any guidelines on improving cassandra's performance for this
>> use case? Any caveats to be aware of? Any tips, like running nodetool
>> repair every X days?
>>
>> Thanks.
>>
>
>

Re: Improving performance where a lot of updates and deletes are required?

Posted by DuyHai Doan <do...@gmail.com>.
Are you sure Cassandra is a good fit for this kind of heavy update & delete
scenario ?

Otherwise, you can always use several tables (one table/day, rotating
through 7 days for a week) and do a truncate of the table at the end of the
day.

On Tue, Nov 8, 2016 at 11:04 AM, Ali Akhtar <al...@gmail.com> wrote:

> I have a use case where a lot of updates and deletes to a table will be
> necessary.
>
> The deletes will be done at a scheduled time, probably at the end of the
> day, each day.
>
> Updates will be done throughout the day, as new data comes in.
>
> Are there any guidelines on improving cassandra's performance for this use
> case? Any caveats to be aware of? Any tips, like running nodetool repair
> every X days?
>
> Thanks.
>