You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Manoj Khangaonkar <kh...@gmail.com> on 2014/01/10 22:50:28 UTC

Read/Write consistency issue

Hi

Using Cassandra 2.0.0.
3 node cluster
Replication 2.
Using consistency ALL for both read and writes.

I have a single thread that reads a value, updates it and writes it back to
the table. The column type is big int. Updating counts for a timestamp.

With single thread and consistency ALL , I expect no lost updates. But as
seem from my application log below,

10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
 old=59614 val =252 new =59866
10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
 old=59866 val =252 new =60118
10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
 old=60118 val =255 new =60373
10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
 old=60373 val =243 new =60616
10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
 old=60616 val =19 new =60635
10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
 old=60616 val =233 new =60849

See the last 2 lines of above log.
value 60116 is updated to 60635. but the next operation reads the old value
60616 again.

I am not using counter column type because it does not support TTL and i
hear there are lot of open issues with counters.

Is there anything else I can do to further tighten the consistency or is
this pattern of high volume read - update - write not going to work in C* ?

regards
MJ

--

Re: Read/Write consistency issue

Posted by Manoj Khangaonkar <kh...@gmail.com>.

old is the value that was read from the column.
val is the value that needs to be added to it.
new is (old + val) that is written back to the column.

regards



On Fri, Jan 10, 2014 at 4:36 PM, Andrey Ilinykh <ai...@gmail.com> wrote:

> For single thread, consistency ALL it should work. I believe you do
> something different. What are these three numbers exactly?
> old=60616 val =19 new =60635
>
>
> On Fri, Jan 10, 2014 at 1:50 PM, Manoj Khangaonkar <kh...@gmail.com>wrote:
>
>> Hi
>>
>> Using Cassandra 2.0.0.
>> 3 node cluster
>> Replication 2.
>> Using consistency ALL for both read and writes.
>>
>> I have a single thread that reads a value, updates it and writes it back
>> to the table. The column type is big int. Updating counts for a timestamp.
>>
>> With single thread and consistency ALL , I expect no lost updates. But as
>> seem from my application log below,
>>
>> 10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=59614 val =252 new =59866
>> 10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=59866 val =252 new =60118
>> 10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=60118 val =255 new =60373
>> 10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=60373 val =243 new =60616
>> 10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=60616 val =19 new =60635
>> 10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=60616 val =233 new =60849
>>
>> See the last 2 lines of above log.
>> value 60116 is updated to 60635. but the next operation reads the old
>> value 60616 again.
>>
>> I am not using counter column type because it does not support TTL and i
>> hear there are lot of open issues with counters.
>>
>> Is there anything else I can do to further tighten the consistency or is
>> this pattern of high volume read - update - write not going to work in C* ?
>>
>> regards
>> MJ
>>
>> --
>>
>>
>


-- 
http://khangaonkar.blogspot.com/

Re: Read/Write consistency issue

Posted by Andrey Ilinykh <ai...@gmail.com>.

For single thread, consistency ALL it should work. I believe you do
something different. What are these three numbers exactly?
old=60616 val =19 new =60635


On Fri, Jan 10, 2014 at 1:50 PM, Manoj Khangaonkar <kh...@gmail.com>wrote:

> Hi
>
> Using Cassandra 2.0.0.
> 3 node cluster
> Replication 2.
> Using consistency ALL for both read and writes.
>
> I have a single thread that reads a value, updates it and writes it back
> to the table. The column type is big int. Updating counts for a timestamp.
>
> With single thread and consistency ALL , I expect no lost updates. But as
> seem from my application log below,
>
> 10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=59614 val =252 new =59866
> 10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=59866 val =252 new =60118
> 10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=60118 val =255 new =60373
> 10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=60373 val =243 new =60616
> 10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=60616 val =19 new =60635
> 10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=60616 val =233 new =60849
>
> See the last 2 lines of above log.
> value 60116 is updated to 60635. but the next operation reads the old
> value 60616 again.
>
> I am not using counter column type because it does not support TTL and i
> hear there are lot of open issues with counters.
>
> Is there anything else I can do to further tighten the consistency or is
> this pattern of high volume read - update - write not going to work in C* ?
>
> regards
> MJ
>
> --
>
>

RE: Read/Write consistency issue

Posted by Todd Carrico <To...@match.com>.

I think the anti-pattern is more about the read/write trying to be atomic.

You might want to logically lock your record unless you are pretty sure you have figured out how to keep users from overwriting each others edits is all.

tc

From: Robert Wille [mailto:rwille@fold3.com]
Sent: Friday, January 10, 2014 4:52 PM
To: user@cassandra.apache.org
Subject: Re: Read/Write consistency issue

Interested in knowing more on why read-before-write is an anti-pattern. In the next month or so, I intend to use Cassandra as a doc store. One very common operation will be to read the document, make a change, and write it back. These would be interactive users modifying their own documents, so rapid repeated writing is not an issue. Why would this be bad?

Robert

From: Steven A Robenalt <sr...@stanford.edu>>
Reply-To: <us...@cassandra.apache.org>>
Date: Friday, January 10, 2014 at 3:41 PM
To: <us...@cassandra.apache.org>>
Subject: Re: Read/Write consistency issue

My understanding is that it's generally a Cassandra anti-pattern to do read-before-write in any case, not just because of this issue. I'd agree with Robert's suggestion earlier in this thread of writing each update independently and aggregating on read.

Steve

Re: Read/Write consistency issue

Posted by Steven A Robenalt <sr...@stanford.edu>.

Hi Robert,

Just to clarify a bit, there's nothing inherently wrong with a
read-modify-write cycle as you would use for a document store. The
read-before-write antipattern refers to depending on a read immediately
before a write, as was being done in the original post. Generally, such a
read is done to either (a) verify that the underlying record hasn't changed
immediately before updating or (b) to merge updated parts of the document
with those originally excluded from the original read. Obviously, both can
be problematic if concurrent modifications are being performed, or if the
operations required to perform the update are executed concurrently.

The original post was problematic for a different reason - updating the
same column very rapidly with the read-before-write antipattern built into
the update. This fails occasionally because the database is not yet
consistent by the time the next read is performed. The result is an update
that mostly, but not always succeeds.

Using Lightweight Transactions and BatchStatements can address many of
these problems in a normal OLTP environment as with a document store, and
will not be likely to have a negative impact on performance, but rapidly
updated time series data is a different animal, and requires its own
strategies and patterns.

Steve





On Fri, Jan 10, 2014 at 3:24 PM, Todd Carrico <To...@match.com>wrote:

>  I’ve solved this for other systems, and it might work here.
>
>
>
> Add a Guid as a field to the record.
>
> When you update the document, check to make sure the Guid hasn’t changed
> since you read it.  If the Guid is the same, go ahead and save the document
> along with a new Guid.
>
>
>
> This keeps you from locking the document if you just want to read it while
> still keeping you from overwriting someone else’s changes.  In this other
> system, it was easy enough to add the guid check as part of the where
> clause:
>
>
>
> Update doc
>
>                 Set Text = Text
>
> Where key = ?
>
> And Guid = ?
>
>
>
> If the row failed to update, then it was removed, or the Guids didn’t
> match.
>
>
>
> Not sure if C* has some magic that can make this better, timestamp should
> do the same thing I think.
>
>
>
> “There are a multitude of methods whereby a feline might be divested of
> its epidermal layer”..
>
>
>
> *From:* Tupshin Harper [mailto:tupshin@tupshin.com]
> *Sent:* Friday, January 10, 2014 5:13 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Read/Write consistency issue
>
>
>
> It is bad because of the risk of concurrent modifications. If you don't
> have some kind of global lock on the document/row, then 2 readers might
> read version A, reader 1 writes version B based on A, and reader 2 writes
> version C based on A, overwriting the changes in B. This is *inherent* to
> the notion distributed systems and multiple writers, and can only be fixed
> by:
>
> 1) Having a global lock, either in the form of a DB lock (CAS for
> Cassandra 2.0 and above), or some higher level business mechanism that is
> ensuring only one concurrent reader/writer for a given document
>
> 2) Idempotent writes by appending at write and aggregate on read. For
> time-series and possibly counter style information, this is often the ideal
> strategy, but usually not so good for documents.
>
> For the counters scenario, idempotent writes, or the rewrite of counters
> (which use idempotent writes behind the scenes) are probably good solutions.
>
> Concurrent editing of documents, on the other hand, is almost the ideal
> scenario for lightweight transactions.
>
> -Tupshin
>
>
>
> On Fri, Jan 10, 2014 at 5:51 PM, Robert Wille <rw...@fold3.com> wrote:
>
>  Interested in knowing more on why read-before-write is an anti-pattern.
> In the next month or so, I intend to use Cassandra as a doc store. One very
> common operation will be to read the document, make a change, and write it
> back. These would be interactive users modifying their own documents, so
> rapid repeated writing is not an issue. Why would this be bad?
>
>
>
> Robert
>
>
>
> *From: *Steven A Robenalt <sr...@stanford.edu>
> *Reply-To: *<us...@cassandra.apache.org>
> *Date: *Friday, January 10, 2014 at 3:41 PM
>
>
> *To: *<us...@cassandra.apache.org>
> *Subject: *Re: Read/Write consistency issue
>
>
>
> My understanding is that it's generally a Cassandra anti-pattern to do
> read-before-write in any case, not just because of this issue. I'd agree
> with Robert's suggestion earlier in this thread of writing each update
> independently and aggregating on read.
>
>
>
> Steve
>
>
>
>
>



-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobenal@stanford.edu
http://highwire.stanford.edu

RE: Read/Write consistency issue

Posted by Todd Carrico <To...@match.com>.

I've solved this for other systems, and it might work here.

Add a Guid as a field to the record.
When you update the document, check to make sure the Guid hasn't changed since you read it.  If the Guid is the same, go ahead and save the document along with a new Guid.

This keeps you from locking the document if you just want to read it while still keeping you from overwriting someone else's changes.  In this other system, it was easy enough to add the guid check as part of the where clause:

Update doc
                Set Text = Text
Where key = ?
And Guid = ?

If the row failed to update, then it was removed, or the Guids didn't match.

Not sure if C* has some magic that can make this better, timestamp should do the same thing I think.

"There are a multitude of methods whereby a feline might be divested of its epidermal layer"..

From: Tupshin Harper [mailto:tupshin@tupshin.com]
Sent: Friday, January 10, 2014 5:13 PM
To: user@cassandra.apache.org
Subject: Re: Read/Write consistency issue

It is bad because of the risk of concurrent modifications. If you don't have some kind of global lock on the document/row, then 2 readers might read version A, reader 1 writes version B based on A, and reader 2 writes version C based on A, overwriting the changes in B. This is *inherent* to the notion distributed systems and multiple writers, and can only be fixed by:
1) Having a global lock, either in the form of a DB lock (CAS for Cassandra 2.0 and above), or some higher level business mechanism that is ensuring only one concurrent reader/writer for a given document
2) Idempotent writes by appending at write and aggregate on read. For time-series and possibly counter style information, this is often the ideal strategy, but usually not so good for documents.
For the counters scenario, idempotent writes, or the rewrite of counters (which use idempotent writes behind the scenes) are probably good solutions.
Concurrent editing of documents, on the other hand, is almost the ideal scenario for lightweight transactions.
-Tupshin

On Fri, Jan 10, 2014 at 5:51 PM, Robert Wille <rw...@fold3.com>> wrote:
Interested in knowing more on why read-before-write is an anti-pattern. In the next month or so, I intend to use Cassandra as a doc store. One very common operation will be to read the document, make a change, and write it back. These would be interactive users modifying their own documents, so rapid repeated writing is not an issue. Why would this be bad?

Robert

From: Steven A Robenalt <sr...@stanford.edu>>
Reply-To: <us...@cassandra.apache.org>>
Date: Friday, January 10, 2014 at 3:41 PM

To: <us...@cassandra.apache.org>>
Subject: Re: Read/Write consistency issue

My understanding is that it's generally a Cassandra anti-pattern to do read-before-write in any case, not just because of this issue. I'd agree with Robert's suggestion earlier in this thread of writing each update independently and aggregating on read.

Steve

Re: Read/Write consistency issue

Posted by Tupshin Harper <tu...@tupshin.com>.

It is bad because of the risk of concurrent modifications. If you don't
have some kind of global lock on the document/row, then 2 readers might
read version A, reader 1 writes version B based on A, and reader 2 writes
version C based on A, overwriting the changes in B. This is *inherent* to
the notion distributed systems and multiple writers, and can only be fixed
by:
1) Having a global lock, either in the form of a DB lock (CAS for Cassandra
2.0 and above), or some higher level business mechanism that is ensuring
only one concurrent reader/writer for a given document
2) Idempotent writes by appending at write and aggregate on read. For
time-series and possibly counter style information, this is often the ideal
strategy, but usually not so good for documents.

For the counters scenario, idempotent writes, or the rewrite of counters
(which use idempotent writes behind the scenes) are probably good solutions.

Concurrent editing of documents, on the other hand, is almost the ideal
scenario for lightweight transactions.

-Tupshin

On Fri, Jan 10, 2014 at 5:51 PM, Robert Wille <rw...@fold3.com> wrote:

> Interested in knowing more on why read-before-write is an anti-pattern. In
> the next month or so, I intend to use Cassandra as a doc store. One very
> common operation will be to read the document, make a change, and write it
> back. These would be interactive users modifying their own documents, so
> rapid repeated writing is not an issue. Why would this be bad?
>
> Robert
>
> From: Steven A Robenalt <sr...@stanford.edu>
> Reply-To: <us...@cassandra.apache.org>
> Date: Friday, January 10, 2014 at 3:41 PM
>
> To: <us...@cassandra.apache.org>
> Subject: Re: Read/Write consistency issue
>
> My understanding is that it's generally a Cassandra anti-pattern to do
> read-before-write in any case, not just because of this issue. I'd agree
> with Robert's suggestion earlier in this thread of writing each update
> independently and aggregating on read.
>
> Steve
>
>

Re: Read/Write consistency issue

Posted by Robert Wille <rw...@fold3.com>.

There is a solution to this problem that I forgot about. The client can
provide the timestamps. If you provide your own timestamps using a
monotonically increasing sequence, then your code will work since it makes
you immune to clock drift and multiple transactions in the same millisecond.
If you are always generating data from a single process, this is pretty
trivial. If in your real environment, you are adding up these numbers from
multiple processes, then you¹ll need to obtain timestamps from a timestamp
server.

Robert

From:  Steven A Robenalt <sr...@stanford.edu>
Reply-To:  <us...@cassandra.apache.org>
Date:  Friday, January 10, 2014 at 4:59 PM
To:  <us...@cassandra.apache.org>
Subject:  Re: Read/Write consistency issue

As was pointed out earlier, Consistency.ALL is still subject to the
possibility of clock drift between nodes, and there is also the problem of
using the exact same timestamp, which is increasingly likely to happen the
faster you update, and the more data points you process. Better to design
with Cassandra's strengths in mind, I'd think.

Steve



On Fri, Jan 10, 2014 at 3:29 PM, Manoj Khangaonkar <kh...@gmail.com>
wrote:
> Thanks all for the response. I will change to keeping writes idempotent and
> aggregate at a later stage.
> 
> But considering my read , write , read operations are sequential and from the
> same thread and with Consistency ALL,
> the write should not return until all replicas have committed. So I am
> expecting all replicas to have the same value, when the next read happens. Not
> true ??
> 
> regards
> 
> 
> On Fri, Jan 10, 2014 at 2:51 PM, Tupshin Harper <tu...@tupshin.com> wrote:
>> Yes this is pretty close to the ultimate anti-pattern in Cassandra. Whenever
>> possible, we encourage models where your updates are idempotent, and not
>> dependent on a read before write. Manoj is looking for what is essentially
>> strong ordering in a distributed system, which always has inherent
>> trade-offs.
>> 
>> CAS (lightweight transactions) in 2.0 might actually be usable for this, but
>> it will badly hurt your performance, and not recommended.
>> 
>> 2.1 counters (major counter rewrite) are actually very likely to be a great
>> fit for this, but they still won't have TTL. That, however, could easily be
>> worked around, IMO. It would just require a bit of housekeeping to keep track
>> of your counters and lazily delete them.
>> 
>> But yes, I third Robert's suggestion of aggregate on read instead of write.
>> 
>> -Tupshin
>> 
>> 
>> On Fri, Jan 10, 2014 at 5:41 PM, Steven A Robenalt <sr...@stanford.edu>
>> wrote:
>>> My understanding is that it's generally a Cassandra anti-pattern to do
>>> read-before-write in any case, not just because of this issue. I'd agree
>>> with Robert's suggestion earlier in this thread of writing each update
>>> independently and aggregating on read.
>>> 
>>> Steve
>>> 
>>> 
>>> 
>>> On Fri, Jan 10, 2014 at 2:35 PM, Robert Wille <rw...@fold3.com> wrote:
>>>> Actually, locking won¹t fix the problem. He¹s getting the problem on a
>>>> single thread.
>>>> 
>>>> I¹m pretty sure that if updates can occur within the same millisecond (or
>>>> more, if there is clock skew), there is literally nothing you can do to
>>>> make this pattern work.
>>>> 
>>>> Robert
>>>> 
>>>> From:  Todd Carrico <To...@match.com>
>>>> Reply-To:  <us...@cassandra.apache.org>
>>>> Date:  Friday, January 10, 2014 at 3:28 PM
>>>> To:  "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> Subject:  RE: Read/Write consistency issue
>>>> 
>>>> That, or roll your own locking.  Means multiple updates, but it works
>>>> reliably.
>>>>  
>>>> tc
>>>>  
>>>> 
>>>> From: Robert Wille [mailto:rwille@fold3.com]
>>>> Sent: Friday, January 10, 2014 4:25 PM
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: Read/Write consistency issue
>>>>  
>>>> 
>>>> Cassandra is a last-write wins kind of a deal. The last write is determined
>>>> by the timestamp. There are two problems with this:
>>>> 1. If your clocks are not synchronized, you¹re totally screwed. Note that
>>>> the 2nd and 3rd to last operations occurred just 2 milliseconds apart. A
>>>> clock skew of 2 milliseconds would definitely manifest itself like that.
>>>> 2. Even if your clocks are perfectly synchronized, timestamps only have
>>>> millisecond granularity. If multiple writes occur within the same
>>>> millisecond, its impossible for Cassandra to determine which one occurred
>>>> last.
>>>> Lots of really good information here:
>>>> http://aphyr.com/posts/294-call-me-maybe-cassandra/
>>>> 
>>>>  
>>>> 
>>>> I¹d be very interested in hearing what others have to say. In the article I
>>>> just linked to, the author experienced similar problems, even with
>>>> ³perfectly synchronized clocks², whatever that means.
>>>> 
>>>>  
>>>> 
>>>> The conclusion I¹ve arrived at after reading and pondering is that if you
>>>> perform multiple updates to a cell, even with synchronous calls from a
>>>> single-threaded app, if those updates occur less than a millisecond apart,
>>>> or approach the sum of the clock drift and network latency, you¹re probably
>>>> hosed.
>>>> 
>>>>  
>>>> 
>>>> I think a better approach for Cassandra would be to write new values each
>>>> time, and then sum them up on read, or perhaps have a process that
>>>> periodically aggregates them. It¹s a tricky business for sure, not one that
>>>> Cassandra is very well equipped to handle.
>>>> 
>>>>  
>>>> 
>>>> Robert
>>>> 
>>>>  
>>>> 
>>>> From: Manoj Khangaonkar <kh...@gmail.com>
>>>> Reply-To: <us...@cassandra.apache.org>
>>>> Date: Friday, January 10, 2014 at 2:50 PM
>>>> To: <us...@cassandra.apache.org>
>>>> Subject: Read/Write consistency issue
>>>> 
>>>>  
>>>> 
>>>> Hi 
>>>> 
>>>>  
>>>> 
>>>> Using Cassandra 2.0.0.
>>>> 
>>>> 3 node cluster
>>>> 
>>>> Replication 2.
>>>> 
>>>> Using consistency ALL for both read and writes.
>>>>  
>>>> 
>>>> I have a single thread that reads a value, updates it and writes it back to
>>>> the table. The column type is big int. Updating counts for a timestamp.
>>>> 
>>>>  
>>>> 
>>>> With single thread and consistency ALL , I expect no lost updates. But as
>>>> seem from my application log below,
>>>> 
>>>>  
>>>> 
>>>> 10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>> old=59614 val =252 new =59866
>>>> 
>>>> 10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>> old=59866 val =252 new =60118
>>>> 
>>>> 10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>> old=60118 val =255 new =60373
>>>> 
>>>> 10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>> old=60373 val =243 new =60616
>>>> 
>>>> 10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>> old=60616 val =19 new =60635
>>>> 
>>>> 10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>> old=60616 val =233 new =60849
>>>> 
>>>>  
>>>> 
>>>> See the last 2 lines of above log.
>>>> 
>>>> value 60116 is updated to 60635. but the next operation reads the old value
>>>> 60616 again.
>>>> 
>>>>  
>>>> 
>>>> I am not using counter column type because it does not support TTL and i
>>>> hear there are lot of open issues with counters.
>>>> 
>>>>  
>>>> 
>>>> Is there anything else I can do to further tighten the consistency or is
>>>> this pattern of high volume read - update - write not going to work in C* ?
>>>> 
>>>>  
>>>> 
>>>> regards
>>>> 
>>>> MJ
>>>> 
>>>>  
>>>> -- 
>>> 
>>> 
>>> 
>>> -- 
>>> Steve Robenalt
>>> Software Architect
>>> HighWire | Stanford University
>>> 425 Broadway St, Redwood City, CA 94063
>>> 
>>> srobenal@stanford.edu
>>> http://highwire.stanford.edu
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> http://khangaonkar.blogspot.com/



-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobenal@stanford.edu
http://highwire.stanford.edu

Re: Read/Write consistency issue

Posted by Steven A Robenalt <sr...@stanford.edu>.

As was pointed out earlier, Consistency.ALL is still subject to the
possibility of clock drift between nodes, and there is also the problem of
using the exact same timestamp, which is increasingly likely to happen the
faster you update, and the more data points you process. Better to design
with Cassandra's strengths in mind, I'd think.

Steve



On Fri, Jan 10, 2014 at 3:29 PM, Manoj Khangaonkar <kh...@gmail.com>wrote:

> Thanks all for the response. I will change to keeping writes idempotent
> and aggregate at a later stage.
>
> But considering my read , write , read operations are sequential and from
> the same thread and with Consistency ALL,
> the write should not return until all replicas have committed. So I am
> expecting all replicas to have the same value, when the next read happens.
> Not true ??
>
> regards
>
>
> On Fri, Jan 10, 2014 at 2:51 PM, Tupshin Harper <tu...@tupshin.com>wrote:
>
>> Yes this is pretty close to the ultimate anti-pattern in Cassandra.
>> Whenever possible, we encourage models where your updates are idempotent,
>> and not dependent on a read before write. Manoj is looking for what is
>> essentially strong ordering in a distributed system, which always has
>> inherent trade-offs.
>>
>> CAS (lightweight transactions) in 2.0 might actually be usable for this,
>> but it will badly hurt your performance, and not recommended.
>>
>> 2.1 counters (major counter rewrite) are actually very likely to be a
>> great fit for this, but they still won't have TTL. That, however, could
>> easily be worked around, IMO. It would just require a bit of housekeeping
>> to keep track of your counters and lazily delete them.
>>
>> But yes, I third Robert's suggestion of aggregate on read instead of
>> write.
>>
>> -Tupshin
>>
>>
>> On Fri, Jan 10, 2014 at 5:41 PM, Steven A Robenalt <srobenal@stanford.edu
>> > wrote:
>>
>>> My understanding is that it's generally a Cassandra anti-pattern to do
>>> read-before-write in any case, not just because of this issue. I'd agree
>>> with Robert's suggestion earlier in this thread of writing each update
>>> independently and aggregating on read.
>>>
>>> Steve
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 2:35 PM, Robert Wille <rw...@fold3.com> wrote:
>>>
>>>> Actually, locking won’t fix the problem. He’s getting the problem on a
>>>> single thread.
>>>>
>>>> I’m pretty sure that if updates can occur within the same millisecond
>>>> (or more, if there is clock skew), there is literally nothing you can do to
>>>> make this pattern work.
>>>>
>>>> Robert
>>>>
>>>> From: Todd Carrico <To...@match.com>
>>>> Reply-To: <us...@cassandra.apache.org>
>>>> Date: Friday, January 10, 2014 at 3:28 PM
>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> Subject: RE: Read/Write consistency issue
>>>>
>>>> That, or roll your own locking.  Means multiple updates, but it works
>>>> reliably.
>>>>
>>>>
>>>>
>>>> tc
>>>>
>>>>
>>>>
>>>> *From:* Robert Wille [mailto:rwille@fold3.com <rw...@fold3.com>]
>>>> *Sent:* Friday, January 10, 2014 4:25 PM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: Read/Write consistency issue
>>>>
>>>>
>>>>
>>>> Cassandra is a last-write wins kind of a deal. The last write is
>>>> determined by the timestamp. There are two problems with this:
>>>>
>>>>    1. If your clocks are not synchronized, you’re totally screwed.
>>>>    Note that the 2nd and 3rd to last operations occurred just 2 milliseconds
>>>>    apart. A clock skew of 2 milliseconds would definitely manifest itself like
>>>>    that.
>>>>    2. Even if your clocks are perfectly synchronized, timestamps only
>>>>    have millisecond granularity. If multiple writes occur within the same
>>>>    millisecond, its impossible for Cassandra to determine which one occurred
>>>>    last.
>>>>
>>>> Lots of really good information here:
>>>> http://aphyr.com/posts/294-call-me-maybe-cassandra/
>>>>
>>>>
>>>>
>>>> I’d be very interested in hearing what others have to say. In the
>>>> article I just linked to, the author experienced similar problems, even
>>>> with “perfectly synchronized clocks”, whatever that means.
>>>>
>>>>
>>>>
>>>> The conclusion I’ve arrived at after reading and pondering is that if
>>>> you perform multiple updates to a cell, even with synchronous calls from a
>>>> single-threaded app, if those updates occur less than a millisecond apart,
>>>> or approach the sum of the clock drift and network latency, you’re probably
>>>> hosed.
>>>>
>>>>
>>>>
>>>> I think a better approach for Cassandra would be to write new values
>>>> each time, and then sum them up on read, or perhaps have a process that
>>>> periodically aggregates them. It’s a tricky business for sure, not one that
>>>> Cassandra is very well equipped to handle.
>>>>
>>>>
>>>>
>>>> Robert
>>>>
>>>>
>>>>
>>>> *From: *Manoj Khangaonkar <kh...@gmail.com>
>>>> *Reply-To: *<us...@cassandra.apache.org>
>>>> *Date: *Friday, January 10, 2014 at 2:50 PM
>>>> *To: *<us...@cassandra.apache.org>
>>>> *Subject: *Read/Write consistency issue
>>>>
>>>>
>>>>
>>>> Hi
>>>>
>>>>
>>>>
>>>> Using Cassandra 2.0.0.
>>>>
>>>> 3 node cluster
>>>>
>>>> Replication 2.
>>>>
>>>> Using consistency ALL for both read and writes.
>>>>
>>>>
>>>>
>>>> I have a single thread that reads a value, updates it and writes it
>>>> back to the table. The column type is big int. Updating counts for a
>>>> timestamp.
>>>>
>>>>
>>>>
>>>> With single thread and consistency ALL , I expect no lost updates. But
>>>> as seem from my application log below,
>>>>
>>>>
>>>>
>>>> 10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=59614 val =252 new =59866
>>>>
>>>> 10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=59866 val =252 new =60118
>>>>
>>>> 10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=60118 val =255 new =60373
>>>>
>>>> 10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=60373 val =243 new =60616
>>>>
>>>> 10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=60616 val =19 new =60635
>>>>
>>>> 10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=60616 val =233 new =60849
>>>>
>>>>
>>>>
>>>> See the last 2 lines of above log.
>>>>
>>>> value 60116 is updated to 60635. but the next operation reads the old
>>>> value 60616 again.
>>>>
>>>>
>>>>
>>>> I am not using counter column type because it does not support TTL and
>>>> i hear there are lot of open issues with counters.
>>>>
>>>>
>>>>
>>>> Is there anything else I can do to further tighten the consistency or
>>>> is this pattern of high volume read - update - write not going to work in
>>>> C* ?
>>>>
>>>>
>>>>
>>>> regards
>>>>
>>>> MJ
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>
>>>
>>>
>>> --
>>> Steve Robenalt
>>> Software Architect
>>>  HighWire | Stanford University
>>> 425 Broadway St, Redwood City, CA 94063
>>>
>>> srobenal@stanford.edu
>>> http://highwire.stanford.edu
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> http://khangaonkar.blogspot.com/
>



-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobenal@stanford.edu
http://highwire.stanford.edu

Re: Read/Write consistency issue

Posted by Tupshin Harper <tu...@tupshin.com>.

That really should work, unless I'm missing something. If you retry your
test  with either 1.2.13 or 2.0.4 (as opposed to earlier releases of either
branch), and triple check your observations to make sure that your single
threaded code is doing what you think it is, and still see the behaviour, I
would want to investigate this much more deeply.

-Tupshin


On Fri, Jan 10, 2014 at 6:29 PM, Manoj Khangaonkar <kh...@gmail.com>wrote:

> Thanks all for the response. I will change to keeping writes idempotent
> and aggregate at a later stage.
>
> But considering my read , write , read operations are sequential and from
> the same thread and with Consistency ALL,
> the write should not return until all replicas have committed. So I am
> expecting all replicas to have the same value, when the next read happens.
> Not true ??
>
> regards
>
>
> On Fri, Jan 10, 2014 at 2:51 PM, Tupshin Harper <tu...@tupshin.com>wrote:
>
>> Yes this is pretty close to the ultimate anti-pattern in Cassandra.
>> Whenever possible, we encourage models where your updates are idempotent,
>> and not dependent on a read before write. Manoj is looking for what is
>> essentially strong ordering in a distributed system, which always has
>> inherent trade-offs.
>>
>> CAS (lightweight transactions) in 2.0 might actually be usable for this,
>> but it will badly hurt your performance, and not recommended.
>>
>> 2.1 counters (major counter rewrite) are actually very likely to be a
>> great fit for this, but they still won't have TTL. That, however, could
>> easily be worked around, IMO. It would just require a bit of housekeeping
>> to keep track of your counters and lazily delete them.
>>
>> But yes, I third Robert's suggestion of aggregate on read instead of
>> write.
>>
>> -Tupshin
>>
>>
>> On Fri, Jan 10, 2014 at 5:41 PM, Steven A Robenalt <srobenal@stanford.edu
>> > wrote:
>>
>>> My understanding is that it's generally a Cassandra anti-pattern to do
>>> read-before-write in any case, not just because of this issue. I'd agree
>>> with Robert's suggestion earlier in this thread of writing each update
>>> independently and aggregating on read.
>>>
>>> Steve
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 2:35 PM, Robert Wille <rw...@fold3.com> wrote:
>>>
>>>> Actually, locking won’t fix the problem. He’s getting the problem on a
>>>> single thread.
>>>>
>>>> I’m pretty sure that if updates can occur within the same millisecond
>>>> (or more, if there is clock skew), there is literally nothing you can do to
>>>> make this pattern work.
>>>>
>>>> Robert
>>>>
>>>> From: Todd Carrico <To...@match.com>
>>>> Reply-To: <us...@cassandra.apache.org>
>>>> Date: Friday, January 10, 2014 at 3:28 PM
>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> Subject: RE: Read/Write consistency issue
>>>>
>>>> That, or roll your own locking.  Means multiple updates, but it works
>>>> reliably.
>>>>
>>>>
>>>>
>>>> tc
>>>>
>>>>
>>>>
>>>> *From:* Robert Wille [mailto:rwille@fold3.com <rw...@fold3.com>]
>>>> *Sent:* Friday, January 10, 2014 4:25 PM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: Read/Write consistency issue
>>>>
>>>>
>>>>
>>>> Cassandra is a last-write wins kind of a deal. The last write is
>>>> determined by the timestamp. There are two problems with this:
>>>>
>>>>    1. If your clocks are not synchronized, you’re totally screwed.
>>>>    Note that the 2nd and 3rd to last operations occurred just 2 milliseconds
>>>>    apart. A clock skew of 2 milliseconds would definitely manifest itself like
>>>>    that.
>>>>    2. Even if your clocks are perfectly synchronized, timestamps only
>>>>    have millisecond granularity. If multiple writes occur within the same
>>>>    millisecond, its impossible for Cassandra to determine which one occurred
>>>>    last.
>>>>
>>>> Lots of really good information here:
>>>> http://aphyr.com/posts/294-call-me-maybe-cassandra/
>>>>
>>>>
>>>>
>>>> I’d be very interested in hearing what others have to say. In the
>>>> article I just linked to, the author experienced similar problems, even
>>>> with “perfectly synchronized clocks”, whatever that means.
>>>>
>>>>
>>>>
>>>> The conclusion I’ve arrived at after reading and pondering is that if
>>>> you perform multiple updates to a cell, even with synchronous calls from a
>>>> single-threaded app, if those updates occur less than a millisecond apart,
>>>> or approach the sum of the clock drift and network latency, you’re probably
>>>> hosed.
>>>>
>>>>
>>>>
>>>> I think a better approach for Cassandra would be to write new values
>>>> each time, and then sum them up on read, or perhaps have a process that
>>>> periodically aggregates them. It’s a tricky business for sure, not one that
>>>> Cassandra is very well equipped to handle.
>>>>
>>>>
>>>>
>>>> Robert
>>>>
>>>>
>>>>
>>>> *From: *Manoj Khangaonkar <kh...@gmail.com>
>>>> *Reply-To: *<us...@cassandra.apache.org>
>>>> *Date: *Friday, January 10, 2014 at 2:50 PM
>>>> *To: *<us...@cassandra.apache.org>
>>>> *Subject: *Read/Write consistency issue
>>>>
>>>>
>>>>
>>>> Hi
>>>>
>>>>
>>>>
>>>> Using Cassandra 2.0.0.
>>>>
>>>> 3 node cluster
>>>>
>>>> Replication 2.
>>>>
>>>> Using consistency ALL for both read and writes.
>>>>
>>>>
>>>>
>>>> I have a single thread that reads a value, updates it and writes it
>>>> back to the table. The column type is big int. Updating counts for a
>>>> timestamp.
>>>>
>>>>
>>>>
>>>> With single thread and consistency ALL , I expect no lost updates. But
>>>> as seem from my application log below,
>>>>
>>>>
>>>>
>>>> 10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=59614 val =252 new =59866
>>>>
>>>> 10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=59866 val =252 new =60118
>>>>
>>>> 10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=60118 val =255 new =60373
>>>>
>>>> 10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=60373 val =243 new =60616
>>>>
>>>> 10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=60616 val =19 new =60635
>>>>
>>>> 10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>>  old=60616 val =233 new =60849
>>>>
>>>>
>>>>
>>>> See the last 2 lines of above log.
>>>>
>>>> value 60116 is updated to 60635. but the next operation reads the old
>>>> value 60616 again.
>>>>
>>>>
>>>>
>>>> I am not using counter column type because it does not support TTL and
>>>> i hear there are lot of open issues with counters.
>>>>
>>>>
>>>>
>>>> Is there anything else I can do to further tighten the consistency or
>>>> is this pattern of high volume read - update - write not going to work in
>>>> C* ?
>>>>
>>>>
>>>>
>>>> regards
>>>>
>>>> MJ
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>
>>>
>>>
>>> --
>>> Steve Robenalt
>>> Software Architect
>>>  HighWire | Stanford University
>>> 425 Broadway St, Redwood City, CA 94063
>>>
>>> srobenal@stanford.edu
>>> http://highwire.stanford.edu
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> http://khangaonkar.blogspot.com/
>

Re: Read/Write consistency issue

Posted by Manoj Khangaonkar <kh...@gmail.com>.

Thanks all for the response. I will change to keeping writes idempotent and
aggregate at a later stage.

But considering my read , write , read operations are sequential and from
the same thread and with Consistency ALL,
the write should not return until all replicas have committed. So I am
expecting all replicas to have the same value, when the next read happens.
Not true ??

regards


On Fri, Jan 10, 2014 at 2:51 PM, Tupshin Harper <tu...@tupshin.com> wrote:

> Yes this is pretty close to the ultimate anti-pattern in Cassandra.
> Whenever possible, we encourage models where your updates are idempotent,
> and not dependent on a read before write. Manoj is looking for what is
> essentially strong ordering in a distributed system, which always has
> inherent trade-offs.
>
> CAS (lightweight transactions) in 2.0 might actually be usable for this,
> but it will badly hurt your performance, and not recommended.
>
> 2.1 counters (major counter rewrite) are actually very likely to be a
> great fit for this, but they still won't have TTL. That, however, could
> easily be worked around, IMO. It would just require a bit of housekeeping
> to keep track of your counters and lazily delete them.
>
> But yes, I third Robert's suggestion of aggregate on read instead of write.
>
> -Tupshin
>
>
> On Fri, Jan 10, 2014 at 5:41 PM, Steven A Robenalt <sr...@stanford.edu>wrote:
>
>> My understanding is that it's generally a Cassandra anti-pattern to do
>> read-before-write in any case, not just because of this issue. I'd agree
>> with Robert's suggestion earlier in this thread of writing each update
>> independently and aggregating on read.
>>
>> Steve
>>
>>
>>
>> On Fri, Jan 10, 2014 at 2:35 PM, Robert Wille <rw...@fold3.com> wrote:
>>
>>> Actually, locking won’t fix the problem. He’s getting the problem on a
>>> single thread.
>>>
>>> I’m pretty sure that if updates can occur within the same millisecond
>>> (or more, if there is clock skew), there is literally nothing you can do to
>>> make this pattern work.
>>>
>>> Robert
>>>
>>> From: Todd Carrico <To...@match.com>
>>> Reply-To: <us...@cassandra.apache.org>
>>> Date: Friday, January 10, 2014 at 3:28 PM
>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> Subject: RE: Read/Write consistency issue
>>>
>>> That, or roll your own locking.  Means multiple updates, but it works
>>> reliably.
>>>
>>>
>>>
>>> tc
>>>
>>>
>>>
>>> *From:* Robert Wille [mailto:rwille@fold3.com <rw...@fold3.com>]
>>> *Sent:* Friday, January 10, 2014 4:25 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Read/Write consistency issue
>>>
>>>
>>>
>>> Cassandra is a last-write wins kind of a deal. The last write is
>>> determined by the timestamp. There are two problems with this:
>>>
>>>    1. If your clocks are not synchronized, you’re totally screwed. Note
>>>    that the 2nd and 3rd to last operations occurred just 2 milliseconds apart.
>>>    A clock skew of 2 milliseconds would definitely manifest itself like that.
>>>    2. Even if your clocks are perfectly synchronized, timestamps only
>>>    have millisecond granularity. If multiple writes occur within the same
>>>    millisecond, its impossible for Cassandra to determine which one occurred
>>>    last.
>>>
>>> Lots of really good information here:
>>> http://aphyr.com/posts/294-call-me-maybe-cassandra/
>>>
>>>
>>>
>>> I’d be very interested in hearing what others have to say. In the
>>> article I just linked to, the author experienced similar problems, even
>>> with “perfectly synchronized clocks”, whatever that means.
>>>
>>>
>>>
>>> The conclusion I’ve arrived at after reading and pondering is that if
>>> you perform multiple updates to a cell, even with synchronous calls from a
>>> single-threaded app, if those updates occur less than a millisecond apart,
>>> or approach the sum of the clock drift and network latency, you’re probably
>>> hosed.
>>>
>>>
>>>
>>> I think a better approach for Cassandra would be to write new values
>>> each time, and then sum them up on read, or perhaps have a process that
>>> periodically aggregates them. It’s a tricky business for sure, not one that
>>> Cassandra is very well equipped to handle.
>>>
>>>
>>>
>>> Robert
>>>
>>>
>>>
>>> *From: *Manoj Khangaonkar <kh...@gmail.com>
>>> *Reply-To: *<us...@cassandra.apache.org>
>>> *Date: *Friday, January 10, 2014 at 2:50 PM
>>> *To: *<us...@cassandra.apache.org>
>>> *Subject: *Read/Write consistency issue
>>>
>>>
>>>
>>> Hi
>>>
>>>
>>>
>>> Using Cassandra 2.0.0.
>>>
>>> 3 node cluster
>>>
>>> Replication 2.
>>>
>>> Using consistency ALL for both read and writes.
>>>
>>>
>>>
>>> I have a single thread that reads a value, updates it and writes it back
>>> to the table. The column type is big int. Updating counts for a timestamp.
>>>
>>>
>>>
>>> With single thread and consistency ALL , I expect no lost updates. But
>>> as seem from my application log below,
>>>
>>>
>>>
>>> 10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>  old=59614 val =252 new =59866
>>>
>>> 10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>  old=59866 val =252 new =60118
>>>
>>> 10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>  old=60118 val =255 new =60373
>>>
>>> 10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>  old=60373 val =243 new =60616
>>>
>>> 10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>  old=60616 val =19 new =60635
>>>
>>> 10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>>  old=60616 val =233 new =60849
>>>
>>>
>>>
>>> See the last 2 lines of above log.
>>>
>>> value 60116 is updated to 60635. but the next operation reads the old
>>> value 60616 again.
>>>
>>>
>>>
>>> I am not using counter column type because it does not support TTL and i
>>> hear there are lot of open issues with counters.
>>>
>>>
>>>
>>> Is there anything else I can do to further tighten the consistency or is
>>> this pattern of high volume read - update - write not going to work in C* ?
>>>
>>>
>>>
>>> regards
>>>
>>> MJ
>>>
>>>
>>>
>>> --
>>>
>>
>>
>>
>> --
>> Steve Robenalt
>> Software Architect
>>  HighWire | Stanford University
>> 425 Broadway St, Redwood City, CA 94063
>>
>> srobenal@stanford.edu
>> http://highwire.stanford.edu
>>
>>
>>
>>
>>
>>
>


-- 
http://khangaonkar.blogspot.com/

Re: Read/Write consistency issue

Posted by Tupshin Harper <tu...@tupshin.com>.

Yes this is pretty close to the ultimate anti-pattern in Cassandra.
Whenever possible, we encourage models where your updates are idempotent,
and not dependent on a read before write. Manoj is looking for what is
essentially strong ordering in a distributed system, which always has
inherent trade-offs.

CAS (lightweight transactions) in 2.0 might actually be usable for this,
but it will badly hurt your performance, and not recommended.

2.1 counters (major counter rewrite) are actually very likely to be a great
fit for this, but they still won't have TTL. That, however, could easily be
worked around, IMO. It would just require a bit of housekeeping to keep
track of your counters and lazily delete them.

But yes, I third Robert's suggestion of aggregate on read instead of write.

-Tupshin


On Fri, Jan 10, 2014 at 5:41 PM, Steven A Robenalt <sr...@stanford.edu>wrote:

> My understanding is that it's generally a Cassandra anti-pattern to do
> read-before-write in any case, not just because of this issue. I'd agree
> with Robert's suggestion earlier in this thread of writing each update
> independently and aggregating on read.
>
> Steve
>
>
>
> On Fri, Jan 10, 2014 at 2:35 PM, Robert Wille <rw...@fold3.com> wrote:
>
>> Actually, locking won’t fix the problem. He’s getting the problem on a
>> single thread.
>>
>> I’m pretty sure that if updates can occur within the same millisecond (or
>> more, if there is clock skew), there is literally nothing you can do to
>> make this pattern work.
>>
>> Robert
>>
>> From: Todd Carrico <To...@match.com>
>> Reply-To: <us...@cassandra.apache.org>
>> Date: Friday, January 10, 2014 at 3:28 PM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: RE: Read/Write consistency issue
>>
>> That, or roll your own locking.  Means multiple updates, but it works
>> reliably.
>>
>>
>>
>> tc
>>
>>
>>
>> *From:* Robert Wille [mailto:rwille@fold3.com <rw...@fold3.com>]
>> *Sent:* Friday, January 10, 2014 4:25 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Read/Write consistency issue
>>
>>
>>
>> Cassandra is a last-write wins kind of a deal. The last write is
>> determined by the timestamp. There are two problems with this:
>>
>>    1. If your clocks are not synchronized, you’re totally screwed. Note
>>    that the 2nd and 3rd to last operations occurred just 2 milliseconds apart.
>>    A clock skew of 2 milliseconds would definitely manifest itself like that.
>>    2. Even if your clocks are perfectly synchronized, timestamps only
>>    have millisecond granularity. If multiple writes occur within the same
>>    millisecond, its impossible for Cassandra to determine which one occurred
>>    last.
>>
>> Lots of really good information here:
>> http://aphyr.com/posts/294-call-me-maybe-cassandra/
>>
>>
>>
>> I’d be very interested in hearing what others have to say. In the article
>> I just linked to, the author experienced similar problems, even with
>> “perfectly synchronized clocks”, whatever that means.
>>
>>
>>
>> The conclusion I’ve arrived at after reading and pondering is that if you
>> perform multiple updates to a cell, even with synchronous calls from a
>> single-threaded app, if those updates occur less than a millisecond apart,
>> or approach the sum of the clock drift and network latency, you’re probably
>> hosed.
>>
>>
>>
>> I think a better approach for Cassandra would be to write new values each
>> time, and then sum them up on read, or perhaps have a process that
>> periodically aggregates them. It’s a tricky business for sure, not one that
>> Cassandra is very well equipped to handle.
>>
>>
>>
>> Robert
>>
>>
>>
>> *From: *Manoj Khangaonkar <kh...@gmail.com>
>> *Reply-To: *<us...@cassandra.apache.org>
>> *Date: *Friday, January 10, 2014 at 2:50 PM
>> *To: *<us...@cassandra.apache.org>
>> *Subject: *Read/Write consistency issue
>>
>>
>>
>> Hi
>>
>>
>>
>> Using Cassandra 2.0.0.
>>
>> 3 node cluster
>>
>> Replication 2.
>>
>> Using consistency ALL for both read and writes.
>>
>>
>>
>> I have a single thread that reads a value, updates it and writes it back
>> to the table. The column type is big int. Updating counts for a timestamp.
>>
>>
>>
>> With single thread and consistency ALL , I expect no lost updates. But as
>> seem from my application log below,
>>
>>
>>
>> 10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=59614 val =252 new =59866
>>
>> 10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=59866 val =252 new =60118
>>
>> 10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=60118 val =255 new =60373
>>
>> 10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=60373 val =243 new =60616
>>
>> 10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=60616 val =19 new =60635
>>
>> 10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>>  old=60616 val =233 new =60849
>>
>>
>>
>> See the last 2 lines of above log.
>>
>> value 60116 is updated to 60635. but the next operation reads the old
>> value 60616 again.
>>
>>
>>
>> I am not using counter column type because it does not support TTL and i
>> hear there are lot of open issues with counters.
>>
>>
>>
>> Is there anything else I can do to further tighten the consistency or is
>> this pattern of high volume read - update - write not going to work in C* ?
>>
>>
>>
>> regards
>>
>> MJ
>>
>>
>>
>> --
>>
>
>
>
> --
> Steve Robenalt
> Software Architect
> HighWire | Stanford University
> 425 Broadway St, Redwood City, CA 94063
>
> srobenal@stanford.edu
> http://highwire.stanford.edu
>
>
>
>
>
>

Re: Read/Write consistency issue

Posted by Robert Wille <rw...@fold3.com>.

Interested in knowing more on why read-before-write is an anti-pattern. In
the next month or so, I intend to use Cassandra as a doc store. One very
common operation will be to read the document, make a change, and write it
back. These would be interactive users modifying their own documents, so
rapid repeated writing is not an issue. Why would this be bad?

Robert

From:  Steven A Robenalt <sr...@stanford.edu>
Reply-To:  <us...@cassandra.apache.org>
Date:  Friday, January 10, 2014 at 3:41 PM
To:  <us...@cassandra.apache.org>
Subject:  Re: Read/Write consistency issue

My understanding is that it's generally a Cassandra anti-pattern to do
read-before-write in any case, not just because of this issue. I'd agree
with Robert's suggestion earlier in this thread of writing each update
independently and aggregating on read.

Steve

Re: Read/Write consistency issue

Posted by Steven A Robenalt <sr...@stanford.edu>.

My understanding is that it's generally a Cassandra anti-pattern to do
read-before-write in any case, not just because of this issue. I'd agree
with Robert's suggestion earlier in this thread of writing each update
independently and aggregating on read.

Steve



On Fri, Jan 10, 2014 at 2:35 PM, Robert Wille <rw...@fold3.com> wrote:

> Actually, locking won’t fix the problem. He’s getting the problem on a
> single thread.
>
> I’m pretty sure that if updates can occur within the same millisecond (or
> more, if there is clock skew), there is literally nothing you can do to
> make this pattern work.
>
> Robert
>
> From: Todd Carrico <To...@match.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Friday, January 10, 2014 at 3:28 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: RE: Read/Write consistency issue
>
> That, or roll your own locking.  Means multiple updates, but it works
> reliably.
>
>
>
> tc
>
>
>
> *From:* Robert Wille [mailto:rwille@fold3.com <rw...@fold3.com>]
> *Sent:* Friday, January 10, 2014 4:25 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Read/Write consistency issue
>
>
>
> Cassandra is a last-write wins kind of a deal. The last write is
> determined by the timestamp. There are two problems with this:
>
>    1. If your clocks are not synchronized, you’re totally screwed. Note
>    that the 2nd and 3rd to last operations occurred just 2 milliseconds apart.
>    A clock skew of 2 milliseconds would definitely manifest itself like that.
>    2. Even if your clocks are perfectly synchronized, timestamps only
>    have millisecond granularity. If multiple writes occur within the same
>    millisecond, its impossible for Cassandra to determine which one occurred
>    last.
>
> Lots of really good information here:
> http://aphyr.com/posts/294-call-me-maybe-cassandra/
>
>
>
> I’d be very interested in hearing what others have to say. In the article
> I just linked to, the author experienced similar problems, even with
> “perfectly synchronized clocks”, whatever that means.
>
>
>
> The conclusion I’ve arrived at after reading and pondering is that if you
> perform multiple updates to a cell, even with synchronous calls from a
> single-threaded app, if those updates occur less than a millisecond apart,
> or approach the sum of the clock drift and network latency, you’re probably
> hosed.
>
>
>
> I think a better approach for Cassandra would be to write new values each
> time, and then sum them up on read, or perhaps have a process that
> periodically aggregates them. It’s a tricky business for sure, not one that
> Cassandra is very well equipped to handle.
>
>
>
> Robert
>
>
>
> *From: *Manoj Khangaonkar <kh...@gmail.com>
> *Reply-To: *<us...@cassandra.apache.org>
> *Date: *Friday, January 10, 2014 at 2:50 PM
> *To: *<us...@cassandra.apache.org>
> *Subject: *Read/Write consistency issue
>
>
>
> Hi
>
>
>
> Using Cassandra 2.0.0.
>
> 3 node cluster
>
> Replication 2.
>
> Using consistency ALL for both read and writes.
>
>
>
> I have a single thread that reads a value, updates it and writes it back
> to the table. The column type is big int. Updating counts for a timestamp.
>
>
>
> With single thread and consistency ALL , I expect no lost updates. But as
> seem from my application log below,
>
>
>
> 10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=59614 val =252 new =59866
>
> 10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=59866 val =252 new =60118
>
> 10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=60118 val =255 new =60373
>
> 10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=60373 val =243 new =60616
>
> 10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=60616 val =19 new =60635
>
> 10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
>  old=60616 val =233 new =60849
>
>
>
> See the last 2 lines of above log.
>
> value 60116 is updated to 60635. but the next operation reads the old
> value 60616 again.
>
>
>
> I am not using counter column type because it does not support TTL and i
> hear there are lot of open issues with counters.
>
>
>
> Is there anything else I can do to further tighten the consistency or is
> this pattern of high volume read - update - write not going to work in C* ?
>
>
>
> regards
>
> MJ
>
>
>
> --
>



-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobenal@stanford.edu
http://highwire.stanford.edu

RE: Read/Write consistency issue

Posted by Todd Carrico <To...@match.com>.

Is it possible to pin to a node, instead of letting the client find the next node (round robin)?

Sorry, a C* noob here...

tc

From: Robert Wille [mailto:rwille@fold3.com]
Sent: Friday, January 10, 2014 4:35 PM
To: user@cassandra.apache.org
Subject: Re: Read/Write consistency issue

Actually, locking won't fix the problem. He's getting the problem on a single thread.

I'm pretty sure that if updates can occur within the same millisecond (or more, if there is clock skew), there is literally nothing you can do to make this pattern work.

Robert

From: Todd Carrico <To...@match.com>>
Reply-To: <us...@cassandra.apache.org>>
Date: Friday, January 10, 2014 at 3:28 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: RE: Read/Write consistency issue

That, or roll your own locking.  Means multiple updates, but it works reliably.

tc

From: Robert Wille [mailto:rwille@fold3.com]
Sent: Friday, January 10, 2014 4:25 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Read/Write consistency issue

Cassandra is a last-write wins kind of a deal. The last write is determined by the timestamp. There are two problems with this:

  1.  If your clocks are not synchronized, you're totally screwed. Note that the 2nd and 3rd to last operations occurred just 2 milliseconds apart. A clock skew of 2 milliseconds would definitely manifest itself like that.
  2.  Even if your clocks are perfectly synchronized, timestamps only have millisecond granularity. If multiple writes occur within the same millisecond, its impossible for Cassandra to determine which one occurred last.
Lots of really good information here: http://aphyr.com/posts/294-call-me-maybe-cassandra/

I'd be very interested in hearing what others have to say. In the article I just linked to, the author experienced similar problems, even with "perfectly synchronized clocks", whatever that means.

The conclusion I've arrived at after reading and pondering is that if you perform multiple updates to a cell, even with synchronous calls from a single-threaded app, if those updates occur less than a millisecond apart, or approach the sum of the clock drift and network latency, you're probably hosed.

I think a better approach for Cassandra would be to write new values each time, and then sum them up on read, or perhaps have a process that periodically aggregates them. It's a tricky business for sure, not one that Cassandra is very well equipped to handle.

Robert

From: Manoj Khangaonkar <kh...@gmail.com>>
Reply-To: <us...@cassandra.apache.org>>
Date: Friday, January 10, 2014 at 2:50 PM
To: <us...@cassandra.apache.org>>
Subject: Read/Write consistency issue

Hi

Using Cassandra 2.0.0.
3 node cluster
Replication 2.
Using consistency ALL for both read and writes.

I have a single thread that reads a value, updates it and writes it back to the table. The column type is big int. Updating counts for a timestamp.

With single thread and consistency ALL , I expect no lost updates. But as seem from my application log below,

10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=59614 val =252 new =59866
10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=59866 val =252 new =60118
10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=60118 val =255 new =60373
10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=60373 val =243 new =60616
10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=60616 val =19 new =60635
10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=60616 val =233 new =60849

See the last 2 lines of above log.
value 60116 is updated to 60635. but the next operation reads the old value 60616 again.

I am not using counter column type because it does not support TTL and i hear there are lot of open issues with counters.

Is there anything else I can do to further tighten the consistency or is this pattern of high volume read - update - write not going to work in C* ?

regards
MJ

--

Re: Read/Write consistency issue

Posted by Robert Wille <rw...@fold3.com>.

Actually, locking won¹t fix the problem. He¹s getting the problem on a
single thread. 

I¹m pretty sure that if updates can occur within the same millisecond (or
more, if there is clock skew), there is literally nothing you can do to make
this pattern work.

Robert

From:  Todd Carrico <To...@match.com>
Reply-To:  <us...@cassandra.apache.org>
Date:  Friday, January 10, 2014 at 3:28 PM
To:  "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject:  RE: Read/Write consistency issue

That, or roll your own locking.  Means multiple updates, but it works
reliably.
 
tc
 

From: Robert Wille [mailto:rwille@fold3.com]
Sent: Friday, January 10, 2014 4:25 PM
To: user@cassandra.apache.org
Subject: Re: Read/Write consistency issue
 

Cassandra is a last-write wins kind of a deal. The last write is determined
by the timestamp. There are two problems with this:
1. If your clocks are not synchronized, you¹re totally screwed. Note that
the 2nd and 3rd to last operations occurred just 2 milliseconds apart. A
clock skew of 2 milliseconds would definitely manifest itself like that.
2. Even if your clocks are perfectly synchronized, timestamps only have
millisecond granularity. If multiple writes occur within the same
millisecond, its impossible for Cassandra to determine which one occurred
last.
Lots of really good information here:
http://aphyr.com/posts/294-call-me-maybe-cassandra/

 

I¹d be very interested in hearing what others have to say. In the article I
just linked to, the author experienced similar problems, even with
³perfectly synchronized clocks², whatever that means.

 

The conclusion I¹ve arrived at after reading and pondering is that if you
perform multiple updates to a cell, even with synchronous calls from a
single-threaded app, if those updates occur less than a millisecond apart,
or approach the sum of the clock drift and network latency, you¹re probably
hosed.

 

I think a better approach for Cassandra would be to write new values each
time, and then sum them up on read, or perhaps have a process that
periodically aggregates them. It¹s a tricky business for sure, not one that
Cassandra is very well equipped to handle.

 

Robert

 

From: Manoj Khangaonkar <kh...@gmail.com>
Reply-To: <us...@cassandra.apache.org>
Date: Friday, January 10, 2014 at 2:50 PM
To: <us...@cassandra.apache.org>
Subject: Read/Write consistency issue

 

Hi 

 

Using Cassandra 2.0.0.

3 node cluster

Replication 2.

Using consistency ALL for both read and writes.
 

I have a single thread that reads a value, updates it and writes it back to
the table. The column type is big int. Updating counts for a timestamp.

 

With single thread and consistency ALL , I expect no lost updates. But as
seem from my application log below,

 

10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=59614 val =252 new =59866

10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=59866 val =252 new =60118

10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=60118 val =255 new =60373

10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=60373 val =243 new =60616

10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=60616 val =19 new =60635

10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=60616 val =233 new =60849

 

See the last 2 lines of above log.

value 60116 is updated to 60635. but the next operation reads the old value
60616 again.

 

I am not using counter column type because it does not support TTL and i
hear there are lot of open issues with counters.

 

Is there anything else I can do to further tighten the consistency or is
this pattern of high volume read - update - write not going to work in C* ?

 

regards

MJ

 
--

RE: Read/Write consistency issue

Posted by Todd Carrico <To...@match.com>.

That, or roll your own locking.  Means multiple updates, but it works reliably.

tc

From: Robert Wille [mailto:rwille@fold3.com]
Sent: Friday, January 10, 2014 4:25 PM
To: user@cassandra.apache.org
Subject: Re: Read/Write consistency issue

Cassandra is a last-write wins kind of a deal. The last write is determined by the timestamp. There are two problems with this:

  1.  If your clocks are not synchronized, you're totally screwed. Note that the 2nd and 3rd to last operations occurred just 2 milliseconds apart. A clock skew of 2 milliseconds would definitely manifest itself like that.
  2.  Even if your clocks are perfectly synchronized, timestamps only have millisecond granularity. If multiple writes occur within the same millisecond, its impossible for Cassandra to determine which one occurred last.
Lots of really good information here: http://aphyr.com/posts/294-call-me-maybe-cassandra/

I'd be very interested in hearing what others have to say. In the article I just linked to, the author experienced similar problems, even with "perfectly synchronized clocks", whatever that means.

The conclusion I've arrived at after reading and pondering is that if you perform multiple updates to a cell, even with synchronous calls from a single-threaded app, if those updates occur less than a millisecond apart, or approach the sum of the clock drift and network latency, you're probably hosed.

I think a better approach for Cassandra would be to write new values each time, and then sum them up on read, or perhaps have a process that periodically aggregates them. It's a tricky business for sure, not one that Cassandra is very well equipped to handle.

Robert

From: Manoj Khangaonkar <kh...@gmail.com>>
Reply-To: <us...@cassandra.apache.org>>
Date: Friday, January 10, 2014 at 2:50 PM
To: <us...@cassandra.apache.org>>
Subject: Read/Write consistency issue

Hi

Using Cassandra 2.0.0.
3 node cluster
Replication 2.
Using consistency ALL for both read and writes.

I have a single thread that reads a value, updates it and writes it back to the table. The column type is big int. Updating counts for a timestamp.

With single thread and consistency ALL , I expect no lost updates. But as seem from my application log below,

10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=59614 val =252 new =59866
10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=59866 val =252 new =60118
10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=60118 val =255 new =60373
10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=60373 val =243 new =60616
10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=60616 val =19 new =60635
10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H  old=60616 val =233 new =60849

See the last 2 lines of above log.
value 60116 is updated to 60635. but the next operation reads the old value 60616 again.

I am not using counter column type because it does not support TTL and i hear there are lot of open issues with counters.

Is there anything else I can do to further tighten the consistency or is this pattern of high volume read - update - write not going to work in C* ?

regards
MJ

--

Re: Read/Write consistency issue

Posted by Robert Wille <rw...@fold3.com>.

Cassandra is a last-write wins kind of a deal. The last write is determined
by the timestamp. There are two problems with this:
1. If your clocks are not synchronized, you¹re totally screwed. Note that
the 2nd and 3rd to last operations occurred just 2 milliseconds apart. A
clock skew of 2 milliseconds would definitely manifest itself like that.
2. Even if your clocks are perfectly synchronized, timestamps only have
millisecond granularity. If multiple writes occur within the same
millisecond, its impossible for Cassandra to determine which one occurred
last.
Lots of really good information here:
http://aphyr.com/posts/294-call-me-maybe-cassandra/

I¹d be very interested in hearing what others have to say. In the article I
just linked to, the author experienced similar problems, even with
³perfectly synchronized clocks², whatever that means.

The conclusion I¹ve arrived at after reading and pondering is that if you
perform multiple updates to a cell, even with synchronous calls from a
single-threaded app, if those updates occur less than a millisecond apart,
or approach the sum of the clock drift and network latency, you¹re probably
hosed.

I think a better approach for Cassandra would be to write new values each
time, and then sum them up on read, or perhaps have a process that
periodically aggregates them. It¹s a tricky business for sure, not one that
Cassandra is very well equipped to handle.

Robert

From:  Manoj Khangaonkar <kh...@gmail.com>
Reply-To:  <us...@cassandra.apache.org>
Date:  Friday, January 10, 2014 at 2:50 PM
To:  <us...@cassandra.apache.org>
Subject:  Read/Write consistency issue

Hi 

Using Cassandra 2.0.0.
3 node cluster
Replication 2.
Using consistency ALL for both read and writes.

I have a single thread that reads a value, updates it and writes it back to
the table. The column type is big int. Updating counts for a timestamp.

With single thread and consistency ALL , I expect no lost updates. But as
seem from my application log below,

10 07:01:58,507 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=59614 val =252 new =59866
10 07:01:58,611 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=59866 val =252 new =60118
10 07:01:59,136 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=60118 val =255 new =60373
10 07:02:00,242 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=60373 val =243 new =60616
10 07:02:00,244 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=60616 val =19 new =60635
10 07:02:00,326 [Thread-10] BeaconCountersCAS2DAO [INFO] 1389366000  H
old=60616 val =233 new =60849

See the last 2 lines of above log.
value 60116 is updated to 60635. but the next operation reads the old value
60616 again.

I am not using counter column type because it does not support TTL and i
hear there are lot of open issues with counters.

Is there anything else I can do to further tighten the consistency or is
this pattern of high volume read - update - write not going to work in C* ?

regards
MJ

--