You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jen Smith <je...@yahoo.com> on 2016/05/12 17:32:12 UTC

client time stamp - force to be continuously increasing?

I'd like to get feedback/opinions on a possible work around for a timestamp + data consistency edge case issue.
Context for this question:
When using client timestamp (default timestamp), on C* that supports it (v3 protocol), on occasion a record update is lost when executing updates in rapid succession (less than a second between updates).  This is because C* by design (Last Write Wins) discards record updates with 'older' timestamp (from client), and server clocks (whether using client timestamp or c* node system timestamp) can move backwards, which results in data loss (eventual consistency is not reached).
For anyone needing more background, this blog has much of the detail https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized as:"Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source, which is backed by gettimeofday. Pretty much every Cassandra client out there does something similar. That means that the timestamps for writes made in a session are derived either from a single Cassandra server clock, or a single app server clock. These clocks can flow backwards, for a number of reasons:- Hardware wonkiness can push clocks days or centuries into the future or past.- Virtualization can wreak havoc on kernel timekeeping.- Misconfigured nodes may not have NTP enabled, or may not be able to reach upstream sources.- Upstream NTP servers can lie.- When the problem is identified and fixed, NTP corrects large time differentials by jumping the clock discontinously to the correct time.- Even when perfectly synchronized, POSIX time itself is not monotonic....
If the system clock goes backwards for any reason, Cassandra’s session consistency guarantees no longer hold."
This blog goes on to suggest a monotonic clock (zookeeper as a possibility, but slow), or better NTP synching (which leaves gaps).
My question is if this can be addressed via software by (using? abusing?) the client provided timestamp field and forcing it to be continuously increasing, and what unexpected issues may arise from doing so?
Specifically, my idea is to set a timestamp on the record when it is created (from the system time of the client doing the create).  then on subsequent updates, always setting default client timestamp to the result of:
currentRecordTs = Math.max(currentRecordTs + standardDelta, System.currentTimeMillis()); 
(where standardDelta is probably 1 second)
Essentially this is keeping a wall clock guard on the record itself, to prevent backwards time stamping/ lost data and ensuring c* applies these updates in the proper order and does not discard any for being 'out of sequence' (ie, persisted 'after' a newer timestamped record was already persisted).
One (acceptable) drawback is that this will result in slightly inaccurate 'timestamp' being set, when  currentRecordTs + standardDelta > System.currentTimeMillis() , and that this could skew more incorrectly over time.
Would you please advise me of any other problems, downstream effects, pitfalls or data consistency issues this approach might cause?   For example will C* object if the 'quasi' timestamp gets 'too far' in the future?
More info - The system in question has LOCAL_QUORUM read/write consistency; and one client (c* session) is usually only updating a record at a time. (although concurrent updates from multiple clients are allowed- LWW is expected for that scenario, and some ambiguity here is ok).
I apologize if this is a duplicate post to the list from me - I first sent this question when i was not subscribed to the list yet, so I am not sure if it has duplicated or not.
thank you kindly for the advice,J. Smith

Re: client time stamp - force to be continuously increasing?

Posted by Jen Smith <je...@yahoo.com>.

Thank you - I think this driver solution may address a portion of the problem?  
Since this solution is from the driver, is it correct to assume that although this could potentially fix the issue within a single (client) session, it could not fix it for a pool of clients, where client A sent the first update and client B sent the 2nd one (because driver session doesn't share memory/data between clients)? Is this correct? if so, I think this doesn't provide the full HA client solution. 
but I think it does help to confirm that a 'reasonable' approach to solving the overall problem is software enforcement of a 'rigorously increasing timestamp' with the understood impact of drifting our timestamps out into the future (when conflict is identified).  it also sounds like from that jira ticket, the continually updating increment can be miliseconds, not seconds (note we are not using batch statements, which I believe have/had a TS granularity bug in the past).

      From: Alexandre Dutra <al...@datastax.com>
 To: Jen Smith <je...@yahoo.com>; "user@cassandra.apache.org" <us...@cassandra.apache.org> 
 Sent: Thursday, May 12, 2016 11:28 AM
 Subject: Re: client time stamp - force to be continuously increasing?

Hi,
Among the ideas worth exploring, please note that the DataStax Java driver for Cassandra now includes a modified version of its monotonic timestamp generators that will indeed strive to provide rigorously increasing timestamps, even in the event of a system clock skew (in which case, they would keep drifting in the future). Such generators obviously do not pretend to provide the same monotonicity guarantees as a vector clock, but have at least the advantage of being fairly easy to set up. See JAVA-727[1] for details.
Hope that helps,
Alexandre
[1] https://datastax-oss.atlassian.net/browse/JAVA-727
On Thu, May 12, 2016 at 7:35 PM Jen Smith <je...@yahoo.com> wrote:

to clarify - the currentRecordTs would be saved on a field on the record being persisted
    From: Jen Smith <je...@yahoo.com>
 To: "user@cassandra.apache.org" <us...@cassandra.apache.org> 
 Sent: Thursday, May 12, 2016 10:32 AM
 Subject: client time stamp - force to be continuously increasing?

I'd like to get feedback/opinions on a possible work around for a timestamp + data consistency edge case issue.
Context for this question:
When using client timestamp (default timestamp), on C* that supports it (v3 protocol), on occasion a record update is lost when executing updates in rapid succession (less than a second between updates).  This is because C* by design (Last Write Wins) discards record updates with 'older' timestamp (from client), and server clocks (whether using client timestamp or c* node system timestamp) can move backwards, which results in data loss (eventual consistency is not reached).
For anyone needing more background, this blog has much of the detail https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized as:"Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source, which is backed by gettimeofday. Pretty much every Cassandra client out there does something similar. That means that the timestamps for writes made in a session are derived either from a single Cassandra server clock, or a single app server clock. These clocks can flow backwards, for a number of reasons:- Hardware wonkiness can push clocks days or centuries into the future or past.- Virtualization can wreak havoc on kernel timekeeping.- Misconfigured nodes may not have NTP enabled, or may not be able to reach upstream sources.- Upstream NTP servers can lie.- When the problem is identified and fixed, NTP corrects large time differentials by jumping the clock discontinously to the correct time.- Even when perfectly synchronized, POSIX time itself is not monotonic....
If the system clock goes backwards for any reason, Cassandra’s session consistency guarantees no longer hold."
This blog goes on to suggest a monotonic clock (zookeeper as a possibility, but slow), or better NTP synching (which leaves gaps).
My question is if this can be addressed via software by (using? abusing?) the client provided timestamp field and forcing it to be continuously increasing, and what unexpected issues may arise from doing so?
Specifically, my idea is to set a timestamp on the record when it is created (from the system time of the client doing the create).  then on subsequent updates, always setting default client timestamp to the result of:
currentRecordTs = Math.max(currentRecordTs + standardDelta, System.currentTimeMillis()); 
(where standardDelta is probably 1 second)
Essentially this is keeping a wall clock guard on the record itself, to prevent backwards time stamping/ lost data and ensuring c* applies these updates in the proper order and does not discard any for being 'out of sequence' (ie, persisted 'after' a newer timestamped record was already persisted).
One (acceptable) drawback is that this will result in slightly inaccurate 'timestamp' being set, when  currentRecordTs + standardDelta > System.currentTimeMillis() , and that this could skew more incorrectly over time.
Would you please advise me of any other problems, downstream effects, pitfalls or data consistency issues this approach might cause?   For example will C* object if the 'quasi' timestamp gets 'too far' in the future?
More info - The system in question has LOCAL_QUORUM read/write consistency; and one client (c* session) is usually only updating a record at a time. (although concurrent updates from multiple clients are allowed- LWW is expected for that scenario, and some ambiguity here is ok).
I apologize if this is a duplicate post to the list from me - I first sent this question when i was not subscribed to the list yet, so I am not sure if it has duplicated or not.
thank you kindly for the advice,J. Smith

-- 
Alexandre Dutra
Driver & Tools Engineer @ DataStax

Re: client time stamp - force to be continuously increasing?

Posted by Alexandre Dutra <al...@datastax.com>.

Hi,

Among the ideas worth exploring, please note that the DataStax Java driver
for Cassandra now includes a modified version of its monotonic timestamp
generators that will indeed strive to provide rigorously increasing
timestamps, even in the event of a system clock skew (in which case, they
would keep drifting in the future). Such generators obviously do not
pretend to provide the same monotonicity guarantees as a vector clock, but
have at least the advantage of being fairly easy to set up. See JAVA-727[1]
for details.

Hope that helps,

Alexandre

[1] https://datastax-oss.atlassian.net/browse/JAVA-727

On Thu, May 12, 2016 at 7:35 PM Jen Smith <je...@yahoo.com> wrote:

> to clarify - the currentRecordTs would be saved on a field on the record
> being persisted
>
> ------------------------------
> *From:* Jen Smith <je...@yahoo.com>
> *To:* "user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Sent:* Thursday, May 12, 2016 10:32 AM
> *Subject:* client time stamp - force to be continuously increasing?
>
> I'd like to get feedback/opinions on a possible work around for a
> timestamp + data consistency edge case issue.
>
> Context for this question:
>
> When using client timestamp (default timestamp), on C* that supports it
> (v3 protocol), on occasion a record update is lost when executing updates
> in rapid succession (less than a second between updates).  This is because
> C* by design (Last Write Wins) discards record updates with 'older'
> timestamp (from client), and server clocks (whether using client timestamp
> or c* node system timestamp) can move backwards, which results in data loss
> (eventual consistency is not reached).
>
> For anyone needing more background, this blog has much of the detail
> https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized as:
> "Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source,
> which is backed by gettimeofday. Pretty much every Cassandra client out
> there does something similar. That means that the timestamps for writes
> made in a session are derived either from a single Cassandra server clock,
> or a single app server clock. These clocks can flow backwards, for a number
> of reasons:
> - Hardware wonkiness can push clocks days or centuries into the future or
> past.
> - Virtualization can wreak havoc on kernel timekeeping.
> - Misconfigured nodes may not have NTP enabled, or may not be able to
> reach upstream sources.
> - Upstream NTP servers can lie.
> - When the problem is identified and fixed, NTP corrects large time
> differentials by jumping the clock discontinously to the correct time.
> - Even when perfectly synchronized, POSIX time itself is not monotonic.
> ...
>
> If the system clock goes backwards for any reason, Cassandra’s session
> consistency guarantees no longer hold."
>
> This blog goes on to suggest a monotonic clock (zookeeper as a
> possibility, but slow), or better NTP synching (which leaves gaps).
>
> My question is if this can be addressed via software by (using? abusing?)
> the client provided timestamp field and forcing it to be continuously
> increasing, and what unexpected issues may arise from doing so?
>
> Specifically, my idea is to set a timestamp on the record when it is
> created (from the system time of the client doing the create).  then on
> subsequent updates, always setting default client timestamp to the result
> of:
>
> currentRecordTs = Math.max(currentRecordTs + standardDelta,
> System.currentTimeMillis());
>
> (where standardDelta is probably 1 second)
>
> Essentially this is keeping a wall clock guard on the record itself, to
> prevent backwards time stamping/ lost data and ensuring c* applies these
> updates in the proper order and does not discard any for being 'out of
> sequence' (ie, persisted 'after' a newer timestamped record was already
> persisted).
>
> One (acceptable) drawback is that this will result in slightly inaccurate
> 'timestamp' being set, when  currentRecordTs + standardDelta >
> System.currentTimeMillis() , and that this could skew more incorrectly over
> time.
>
> Would you please advise me of any other problems, downstream effects,
> pitfalls or data consistency issues this approach might cause?   For
> example will C* object if the 'quasi' timestamp gets 'too far' in the
> future?
>
> More info - The system in question has LOCAL_QUORUM read/write
> consistency; and one client (c* session) is usually only updating a record
> at a time. (although concurrent updates from multiple clients are allowed-
> LWW is expected for that scenario, and some ambiguity here is ok).
>
> I apologize if this is a duplicate post to the list from me - I first sent
> this question when i was not subscribed to the list yet, so I am not sure
> if it has duplicated or not.
>
> thank you kindly for the advice,
> J. Smith
>
>
>
> --
Alexandre Dutra
Driver & Tools Engineer @ DataStax

Re: client time stamp - force to be continuously increasing?

Posted by Jen Smith <je...@yahoo.com>.

to clarify - the currentRecordTs would be saved on a field on the record being persisted
      From: Jen Smith <je...@yahoo.com>
 To: "user@cassandra.apache.org" <us...@cassandra.apache.org> 
 Sent: Thursday, May 12, 2016 10:32 AM
 Subject: client time stamp - force to be continuously increasing?

I'd like to get feedback/opinions on a possible work around for a timestamp + data consistency edge case issue.
Context for this question:
When using client timestamp (default timestamp), on C* that supports it (v3 protocol), on occasion a record update is lost when executing updates in rapid succession (less than a second between updates).  This is because C* by design (Last Write Wins) discards record updates with 'older' timestamp (from client), and server clocks (whether using client timestamp or c* node system timestamp) can move backwards, which results in data loss (eventual consistency is not reached).
For anyone needing more background, this blog has much of the detail https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized as:"Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source, which is backed by gettimeofday. Pretty much every Cassandra client out there does something similar. That means that the timestamps for writes made in a session are derived either from a single Cassandra server clock, or a single app server clock. These clocks can flow backwards, for a number of reasons:- Hardware wonkiness can push clocks days or centuries into the future or past.- Virtualization can wreak havoc on kernel timekeeping.- Misconfigured nodes may not have NTP enabled, or may not be able to reach upstream sources.- Upstream NTP servers can lie.- When the problem is identified and fixed, NTP corrects large time differentials by jumping the clock discontinously to the correct time.- Even when perfectly synchronized, POSIX time itself is not monotonic....
If the system clock goes backwards for any reason, Cassandra’s session consistency guarantees no longer hold."
This blog goes on to suggest a monotonic clock (zookeeper as a possibility, but slow), or better NTP synching (which leaves gaps).
My question is if this can be addressed via software by (using? abusing?) the client provided timestamp field and forcing it to be continuously increasing, and what unexpected issues may arise from doing so?
Specifically, my idea is to set a timestamp on the record when it is created (from the system time of the client doing the create).  then on subsequent updates, always setting default client timestamp to the result of:
currentRecordTs = Math.max(currentRecordTs + standardDelta, System.currentTimeMillis()); 
(where standardDelta is probably 1 second)
Essentially this is keeping a wall clock guard on the record itself, to prevent backwards time stamping/ lost data and ensuring c* applies these updates in the proper order and does not discard any for being 'out of sequence' (ie, persisted 'after' a newer timestamped record was already persisted).
One (acceptable) drawback is that this will result in slightly inaccurate 'timestamp' being set, when  currentRecordTs + standardDelta > System.currentTimeMillis() , and that this could skew more incorrectly over time.
Would you please advise me of any other problems, downstream effects, pitfalls or data consistency issues this approach might cause?   For example will C* object if the 'quasi' timestamp gets 'too far' in the future?
More info - The system in question has LOCAL_QUORUM read/write consistency; and one client (c* session) is usually only updating a record at a time. (although concurrent updates from multiple clients are allowed- LWW is expected for that scenario, and some ambiguity here is ok).
I apologize if this is a duplicate post to the list from me - I first sent this question when i was not subscribed to the list yet, so I am not sure if it has duplicated or not.
thank you kindly for the advice,J. Smith

Re: client time stamp - force to be continuously increasing?

Posted by Alexandre Dutra <al...@datastax.com>.

> Since this solution is from the driver, is it correct to assume that
although this could potentially fix the issue within a single (client)
session, it could not fix it for a pool of clients, where client A sent the
first update and client B sent the 2nd one (because driver session doesn't
share memory/data between clients)? Is this correct? if so, I think this
doesn't provide the full HA client solution.

That is correct.

> but I think it does help to confirm that a 'reasonable' approach to
solving the overall problem is software enforcement of a 'rigorously
increasing timestamp' with the understood impact of drifting our timestamps
out into the future (when conflict is identified).  it also sounds like
from that jira ticket, the continually updating increment can be
miliseconds, not seconds.

The driver's timestamp generators all have microsecond granularity
actually. So timestamps get incremented microsecond by microsecond when the
system clock goes astray.


On Sat, May 14, 2016 at 6:18 PM Eric Evans <jo...@gmail.com>
wrote:

> On Thu, May 12, 2016 at 10:32 AM, Jen Smith <je...@yahoo.com>
> wrote:
> >
> > My question is if this can be addressed via software by (using? abusing?)
> > the client provided timestamp field and forcing it to be continuously
> > increasing, and what unexpected issues may arise from doing so?
>
> I'm not sure if you're looking for an answer for your own particular
> use-case(s), or something more general, but I thought I'd point out
> that client-provided timestamps are as much a feature in other
> contexts as they are a bug in this one, precisely because they *can*
> be out of order; They are meant to represent a causal ordering.
>
> In other words, if you somehow unconditionally enforced monotonicity
> on the server side[*] you'd break a whole other set of use-cases.
>
>
> [*]: But, how would you even manage such a thing without coordination,
> in a distributed environment?
>
> --
> Eric Evans
> john.eric.evans@gmail.com
>
-- 
Alexandre Dutra
Driver & Tools Engineer @ DataStax

Re: client time stamp - force to be continuously increasing?

Posted by Eric Evans <jo...@gmail.com>.

On Thu, May 12, 2016 at 10:32 AM, Jen Smith <je...@yahoo.com> wrote:
>
> My question is if this can be addressed via software by (using? abusing?)
> the client provided timestamp field and forcing it to be continuously
> increasing, and what unexpected issues may arise from doing so?

I'm not sure if you're looking for an answer for your own particular
use-case(s), or something more general, but I thought I'd point out
that client-provided timestamps are as much a feature in other
contexts as they are a bug in this one, precisely because they *can*
be out of order; They are meant to represent a causal ordering.

In other words, if you somehow unconditionally enforced monotonicity
on the server side[*] you'd break a whole other set of use-cases.

[*]: But, how would you even manage such a thing without coordination,
in a distributed environment?

-- 
Eric Evans
john.eric.evans@gmail.com