You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Charlie Mason <ch...@gmail.com> on 2014/06/03 21:13:55 UTC

Consolidating records and TTL

Hi All.

I have a system thats going to make possibly several concurrent changes to
a running total. I know I could use a counter for this. However I have
extra meta data I can store with the changes which would allow me to reply
the changes. If I use a counter and it looses some writes I can't recover
it as I will only have its current total not the extra meta data to know
where to replay from.

What I was planning to do was write each change of the value to a CQL table
with a Time UUID as a row level primary key as well as a partition key.
Then when I need to read the running total back I will do a query for all
the changes and add them up to get the total.

As there could be tens of thousands of these I want to have a period after
which these are consolidated. Most won't be any where near that but a few
will which I need to be able to support. So I was also going to have a
consolidated total table which holds the UUID of the values consolidated up
to. Since I can bound the query for the recent updates by the UUID I should
be able to avoid all the tombstones. So if the read encounters any changes
that can be consolidated it inserts a new consolidated value and deletes
the newly consolidated changes.

What I am slightly worried about is what happens if the consolidated value
insert fails but the deletes to the change records succeed. I would be left
with an inconsistent total indefinitely. I have come up with a couple of
ideas:


1, I could make it require all nodes to acknowledge it before deleting the
difference records.

2, May be I could have another period after its consolidated but before its
deleted?

3, Is there anyway I could use the TTL to allow to it to be deleted after a
period of time? Chances are another read would come in and fix the value.


Anyone got any other suggestions on how I could implement this?


Thanks,

Charlie M

Re: Consolidating records and TTL

Posted by Tyler Hobbs <ty...@datastax.com>.

On Thu, Jun 5, 2014 at 2:38 PM, Charlie Mason <ch...@gmail.com> wrote:

>
> I can't do the initial account insert with a TTL as I can't guarantee when
> a new value would come along and so replace this account record. However
> when I insert the new account record, instead of deleting the old one could
> I reinsert it with a TTL of say 1 month.
>
> How would compaction handle this. Would the original record get compacted
> away after 1 month + the GC Grace period or would it hang around still?
>

Yes, after 1 month + gc_grace, it will be eligible for removal during
compaction.  Of course, a compaction on that sstable still has to take
place before it can be removed.  If you're using
SizeTieredCompactionStrategy (the default) and have a lot of data, that may
take a few more days.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Consolidating records and TTL

Posted by Charlie Mason <ch...@gmail.com>.

Hi All,

Thanks for the replies. These detailed explanations are extremely useful.
The are much appreciated!

One other question that came to mind with this. Is there any way I can use
a TTLs to keep the old account entries around for a fixed length of time
rather than deleting them immediately? That way if I had an issue I could
replay the process form one of these earlier points. For the regular reads
of the account table I can use a limit of 1, with the reverse comparator
set I should always get the newest record without hitting an older record
first.

I can't do the initial account insert with a TTL as I can't guarantee when
a new value would come along and so replace this account record. However
when I insert the new account record, instead of deleting the old one could
I reinsert it with a TTL of say 1 month.

How would compaction handle this. Would the original record get compacted
away after 1 month + the GC Grace period or would it hang around still?

Thanks,

Charlie M


On Thu, Jun 5, 2014 at 5:32 PM, James Campbell <james@breachintelligence.com
> wrote:

>  Thanks for creating and opening the discussion on this use case.  I have
> been evaluating Cassandra for a very similar problem, but with the
> small twist that I'd like to roll up the ledger entries into the
> aggregated 'account' information on a regular basis to ensure that even
> rarely-read (but often written) data will be periodically rolled up and
> ready for quick reads.  There is also a larger twist that for me 'account'
> is a compound key and the typical read pattern will include reading from multiple
> rows within the same partition (each of which would require separate
> merging from the ledger).
>
>
>  This response prompts three questions for me about that:
>
>
>  1. Just to clarify the requirement that only one client thread does this
> at a time, I assume you mean only one thread should handle any given
> 'account' at a time, right?  There wouldn't be problems having multiple
> clients do this data maintenance on different accounts at the same time, I
> hope.
>
>
>  2. For my use case, hadoop integration seems the natural fit, because
> I'd like to batch these updates to run at night during query down-time, but
> the current hadoop tools don't really support reading from or writing to
> multiple tables, as is required for this implementation.  I've seen a few
> posts from people who have written homegrown input/output formats for
> hadoop, but I haven't tried to use them or evaluate their stability.  Is
> there another mechanism I should be thinking about for that sort of batch
> updating?
>
>
>  James Campbell
>  ------------------------------
> *From:* Aaron Morton <aa...@thelastpickle.com>
> *Sent:* Thursday, June 5, 2014 5:26 AM
> *To:* Cassandra User
> *Cc:* charlie.mas@gmail.com
> *Subject:* Re: Consolidating records and TTL
>
>  As Tyler says, with atomic batches which are enabled by default the
> cluster will keep trying to replay the insert / deletes.
>
>  Nodes check their local batch log for failed batches, ones where the
> coordinator did not acknowledge it had successfully completed, every 60
> seconds. So there is a window where it’s possible for not all mutations in
> the batch to be completed. This could happen when a write timeout occurs
> when processing a batch of 2 rows; the request CL will not have been
> achieved on one or more of the rows. The coordinator will leave it up to
> the batch log to replay the request, and the client driver will (by default
> config) not retry.
>
>  You can use a model like this.
>
>  create table ledger (
> account int,
> tx_id  timeuuid,
> sub_total  int,
> primary key (account, tx_id)
> );
>
>  create table account (
> account  int,
> total int,
> last_tx_id timeuuid,
> primary key (account)
> );
>
>  To get the total:
>
>  select * from account where account = X;
>
>  Then get the ledger entries you need
>
>  select * from ledger where account = X and tx_id > last_tx_id;
>
>  This query will degrade when the partition size in the ledger table gets
> bigger, as it will need to read the column index
> (see column_index_size_in_kb in yaml). It will use that to find the first
> page that contains the rows we are interested in and then read forwards to
> the end of the row. It’s not the most efficient type of read but if you are
> going to delete ledger entries this *should* be able to skip over the
> tombstones without reading them.
>
>  When you want to update the total in the account write to the account
> table and update both the total and the last_tx_id. You can then delete
> ledger entries if needed. Don’t forget to ensure that only one client
> thread is doing this at a time.
>
>  Hope that helps.
> Aaron
>
>
>       -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
>  Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>  On 5/06/2014, at 10:37 am, Tyler Hobbs <ty...@datastax.com> wrote:
>
>  Just use an atomic batch that holds both the insert and deletes:
> http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
>
>
> On Tue, Jun 3, 2014 at 2:13 PM, Charlie Mason <ch...@gmail.com>
> wrote:
>
>> Hi All.
>>
>>  I have a system thats going to make possibly several concurrent changes
>> to a running total. I know I could use a counter for this. However I have
>> extra meta data I can store with the changes which would allow me to reply
>> the changes. If I use a counter and it looses some writes I can't recover
>> it as I will only have its current total not the extra meta data to know
>> where to replay from.
>>
>>  What I was planning to do was write each change of the value to a CQL
>> table with a Time UUID as a row level primary key as well as a partition
>> key. Then when I need to read the running total back I will do a query for
>> all the changes and add them up to get the total.
>>
>>  As there could be tens of thousands of these I want to have a period
>> after which these are consolidated. Most won't be any where near that but a
>> few will which I need to be able to support. So I was also going to have a
>> consolidated total table which holds the UUID of the values consolidated up
>> to. Since I can bound the query for the recent updates by the UUID I should
>> be able to avoid all the tombstones. So if the read encounters any changes
>> that can be consolidated it inserts a new consolidated value and deletes
>> the newly consolidated changes.
>>
>>  What I am slightly worried about is what happens if the consolidated
>> value insert fails but the deletes to the change records succeed. I would
>> be left with an inconsistent total indefinitely. I have come up with a
>> couple of ideas:
>>
>>
>>  1, I could make it require all nodes to acknowledge it before deleting
>> the difference records.
>>
>>  2, May be I could have another period after its consolidated but before
>> its deleted?
>>
>>  3, Is there anyway I could use the TTL to allow to it to be deleted
>> after a period of time? Chances are another read would come in and fix the
>> value.
>>
>>
>>  Anyone got any other suggestions on how I could implement this?
>>
>>
>>  Thanks,
>>
>>  Charlie M
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>
>

RE: Consolidating records and TTL

Posted by James Campbell <ja...@breachintelligence.com>.

Thanks for creating and opening the discussion on this use case.  I have been evaluating Cassandra for a very similar problem, but with the small twist that I'd like to roll up the ledger entries into the aggregated 'account' information on a regular basis to ensure that even rarely-read (but often written) data will be periodically rolled up and ready for quick reads.  There is also a larger twist that for me 'account' is a compound key and the typical read pattern will include reading from multiple rows within the same partition (each of which would require separate merging from the ledger).


This response prompts three questions for me about that:


1. Just to clarify the requirement that only one client thread does this at a time, I assume you mean only one thread should handle any given 'account' at a time, right?  There wouldn't be problems having multiple clients do this data maintenance on different accounts at the same time, I hope.


2. For my use case, hadoop integration seems the natural fit, because I'd like to batch these updates to run at night during query down-time, but the current hadoop tools don't really support reading from or writing to multiple tables, as is required for this implementation.  I've seen a few posts from people who have written homegrown input/output formats for hadoop, but I haven't tried to use them or evaluate their stability.  Is there another mechanism I should be thinking about for that sort of batch updating?


James Campbell

________________________________
From: Aaron Morton <aa...@thelastpickle.com>
Sent: Thursday, June 5, 2014 5:26 AM
To: Cassandra User
Cc: charlie.mas@gmail.com
Subject: Re: Consolidating records and TTL

As Tyler says, with atomic batches which are enabled by default the cluster will keep trying to replay the insert / deletes.

Nodes check their local batch log for failed batches, ones where the coordinator did not acknowledge it had successfully completed, every 60 seconds. So there is a window where it’s possible for not all mutations in the batch to be completed. This could happen when a write timeout occurs when processing a batch of 2 rows; the request CL will not have been achieved on one or more of the rows. The coordinator will leave it up to the batch log to replay the request, and the client driver will (by default config) not retry.

You can use a model like this.

create table ledger (
account int,
tx_id  timeuuid,
sub_total  int,
primary key (account, tx_id)
);

create table account (
account  int,
total int,
last_tx_id timeuuid,
primary key (account)
);

To get the total:

select * from account where account = X;

Then get the ledger entries you need

select * from ledger where account = X and tx_id > last_tx_id;

This query will degrade when the partition size in the ledger table gets bigger, as it will need to read the column index (see column_index_size_in_kb in yaml). It will use that to find the first page that contains the rows we are interested in and then read forwards to the end of the row. It’s not the most efficient type of read but if you are going to delete ledger entries this *should* be able to skip over the tombstones without reading them.

When you want to update the total in the account write to the account table and update both the total and the last_tx_id. You can then delete ledger entries if needed. Don’t forget to ensure that only one client thread is doing this at a time.

Hope that helps.
Aaron


-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 5/06/2014, at 10:37 am, Tyler Hobbs <ty...@datastax.com>> wrote:

Just use an atomic batch that holds both the insert and deletes: http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2


On Tue, Jun 3, 2014 at 2:13 PM, Charlie Mason <ch...@gmail.com>> wrote:
Hi All.

I have a system thats going to make possibly several concurrent changes to a running total. I know I could use a counter for this. However I have extra meta data I can store with the changes which would allow me to reply the changes. If I use a counter and it looses some writes I can't recover it as I will only have its current total not the extra meta data to know where to replay from.

What I was planning to do was write each change of the value to a CQL table with a Time UUID as a row level primary key as well as a partition key. Then when I need to read the running total back I will do a query for all the changes and add them up to get the total.

As there could be tens of thousands of these I want to have a period after which these are consolidated. Most won't be any where near that but a few will which I need to be able to support. So I was also going to have a consolidated total table which holds the UUID of the values consolidated up to. Since I can bound the query for the recent updates by the UUID I should be able to avoid all the tombstones. So if the read encounters any changes that can be consolidated it inserts a new consolidated value and deletes the newly consolidated changes.

What I am slightly worried about is what happens if the consolidated value insert fails but the deletes to the change records succeed. I would be left with an inconsistent total indefinitely. I have come up with a couple of ideas:


1, I could make it require all nodes to acknowledge it before deleting the difference records.

2, May be I could have another period after its consolidated but before its deleted?

3, Is there anyway I could use the TTL to allow to it to be deleted after a period of time? Chances are another read would come in and fix the value.


Anyone got any other suggestions on how I could implement this?


Thanks,

Charlie M



--
Tyler Hobbs
DataStax<http://datastax.com/>

Re: Consolidating records and TTL

Posted by Aaron Morton <aa...@thelastpickle.com>.

As Tyler says, with atomic batches which are enabled by default the cluster will keep trying to replay the insert / deletes. 

Nodes check their local batch log for failed batches, ones where the coordinator did not acknowledge it had successfully completed, every 60 seconds. So there is a window where it’s possible for not all mutations in the batch to be completed. This could happen when a write timeout occurs when processing a batch of 2 rows; the request CL will not have been achieved on one or more of the rows. The coordinator will leave it up to the batch log to replay the request, and the client driver will (by default config) not retry. 

You can use a model like this. 

create table ledger (
	account 		int, 
	tx_id 		timeuuid, 
	sub_total 		int,
	primary key (account, tx_id)
);

create table account (
	account 		int, 
	total			int, 
	last_tx_id		timeuuid, 
	primary key (account)
);

To get the total:

select * from account where account = X;

Then get the ledger entries you need

select * from ledger where account = X and tx_id > last_tx_id;

This query will degrade when the partition size in the ledger table gets bigger, as it will need to read the column index (see column_index_size_in_kb in yaml). It will use that to find the first page that contains the rows we are interested in and then read forwards to the end of the row. It’s not the most efficient type of read but if you are going to delete ledger entries this *should* be able to skip over the tombstones without reading them. 

When you want to update the total in the account write to the account table and update both the total and the last_tx_id. You can then delete ledger entries if needed. Don’t forget to ensure that only one client thread is doing this at a time. 

Hope that helps. 
Aaron

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 5/06/2014, at 10:37 am, Tyler Hobbs <ty...@datastax.com> wrote:

> Just use an atomic batch that holds both the insert and deletes: http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
> 
> 
> On Tue, Jun 3, 2014 at 2:13 PM, Charlie Mason <ch...@gmail.com> wrote:
> Hi All.
> 
> I have a system thats going to make possibly several concurrent changes to a running total. I know I could use a counter for this. However I have extra meta data I can store with the changes which would allow me to reply the changes. If I use a counter and it looses some writes I can't recover it as I will only have its current total not the extra meta data to know where to replay from.
> 
> What I was planning to do was write each change of the value to a CQL table with a Time UUID as a row level primary key as well as a partition key. Then when I need to read the running total back I will do a query for all the changes and add them up to get the total.
> 
> As there could be tens of thousands of these I want to have a period after which these are consolidated. Most won't be any where near that but a few will which I need to be able to support. So I was also going to have a consolidated total table which holds the UUID of the values consolidated up to. Since I can bound the query for the recent updates by the UUID I should be able to avoid all the tombstones. So if the read encounters any changes that can be consolidated it inserts a new consolidated value and deletes the newly consolidated changes.
> 
> What I am slightly worried about is what happens if the consolidated value insert fails but the deletes to the change records succeed. I would be left with an inconsistent total indefinitely. I have come up with a couple of ideas:
> 
> 
> 1, I could make it require all nodes to acknowledge it before deleting the difference records.
> 
> 2, May be I could have another period after its consolidated but before its deleted?
> 
> 3, Is there anyway I could use the TTL to allow to it to be deleted after a period of time? Chances are another read would come in and fix the value.
> 
> 
> Anyone got any other suggestions on how I could implement this?
> 
> 
> Thanks,
> 
> Charlie M
> 
> 
> 
> -- 
> Tyler Hobbs
> DataStax

Re: Consolidating records and TTL

Posted by Tyler Hobbs <ty...@datastax.com>.

Just use an atomic batch that holds both the insert and deletes:
http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2


On Tue, Jun 3, 2014 at 2:13 PM, Charlie Mason <ch...@gmail.com> wrote:

> Hi All.
>
> I have a system thats going to make possibly several concurrent changes to
> a running total. I know I could use a counter for this. However I have
> extra meta data I can store with the changes which would allow me to reply
> the changes. If I use a counter and it looses some writes I can't recover
> it as I will only have its current total not the extra meta data to know
> where to replay from.
>
> What I was planning to do was write each change of the value to a CQL
> table with a Time UUID as a row level primary key as well as a partition
> key. Then when I need to read the running total back I will do a query for
> all the changes and add them up to get the total.
>
> As there could be tens of thousands of these I want to have a period after
> which these are consolidated. Most won't be any where near that but a few
> will which I need to be able to support. So I was also going to have a
> consolidated total table which holds the UUID of the values consolidated up
> to. Since I can bound the query for the recent updates by the UUID I should
> be able to avoid all the tombstones. So if the read encounters any changes
> that can be consolidated it inserts a new consolidated value and deletes
> the newly consolidated changes.
>
> What I am slightly worried about is what happens if the consolidated value
> insert fails but the deletes to the change records succeed. I would be left
> with an inconsistent total indefinitely. I have come up with a couple of
> ideas:
>
>
> 1, I could make it require all nodes to acknowledge it before deleting the
> difference records.
>
> 2, May be I could have another period after its consolidated but before
> its deleted?
>
> 3, Is there anyway I could use the TTL to allow to it to be deleted after
> a period of time? Chances are another read would come in and fix the value.
>
>
> Anyone got any other suggestions on how I could implement this?
>
>
> Thanks,
>
> Charlie M
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>