You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/04/01 21:36:18 UTC

[jira] [Comment Edited] (CASSANDRA-6106) QueryState.getTimestamp() & FBUtilities.timestampMicros() reads current timestamp with System.currentTimeMillis() * 1000 instead of System.nanoTime() / 1000

    [ https://issues.apache.org/jira/browse/CASSANDRA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956938#comment-13956938 ] 

Benedict edited comment on CASSANDRA-6106 at 4/1/14 7:34 PM:
-------------------------------------------------------------

It doesn't look safe to me to simply grab gtod.wall_time_sec anyway, even if we could find its location, as the nanos value gets repaired after reading with another call. We could investigate further, but for the time being I have a reasonably straightforward solution [here|http://github.com/belliottsmith/cassandra/tree/6106-microstime]

I started by simply calling the rt clock_gettime method through JNA, which unfortunately clocks in at a heavy 7 micros; since nanoTime and currentTimeMillis are < 0.03 micros, this seemed a little unacceptable. So what I've done is opted to periodically (once per second) grab the latest micros time via the best method possible (clock_gettime if available, currentTimeMillis * 1000 otherwise) and use this to reset the offset, however to ensure we have a smooth transition I:

# Cap the rate of  change at 50ms per second
# Ensure it never leaps back in time, at least on any given thread (no way to guarantee stronger than this)
# Only apply a change if it is at least 1ms out, to avoid noise (possibly should tighten this to 100 micros, or dependent on resolution of time library we're using)

The result is a method that costs around the same as a raw call to System.nanoTime() but gives pretty decent accuracy. Obviously any method that involves using nanos and calculating an offset from a method that takes ~7micros to return is going to have an inherent inaccuracy, but no more than the 7micros direct method call would itself, and the inaccuracy will be consistent given the jitter reduction I'm applying. At startup we also sample the offset 10k times, derive a 90%ile for elapsed time fetching the offset (we ignore future offsets we calculate that take more than twice this period to sample) and average all of those within the 90%ile.




was (Author: benedict):
It doesn't look safe to me to simply grab gtod.wall_time_sec anyway, even if we could find its location, as the nanos value gets repaired after reading with another call. We could investigate further, but for the time being I have a reasonably straightforward solution [here|github.com/belliottsmith/cassandra/tree/6106-microstime]

I started by simply calling the rt clock_gettime method through JNA, which unfortunately clocks in at a heavy 7 micros; since nanoTime and currentTimeMillis are < 0.03 micros, this seemed a little unacceptable. So what I've done is opted to periodically (once per second) grab the latest micros time via the best method possible (clock_gettime if available, currentTimeMillis * 1000 otherwise) and use this to reset the offset, however to ensure we have a smooth transition I:

# Cap the rate of  change at 50ms per second
# Ensure it never leaps back in time, at least on any given thread (no way to guarantee stronger than this)
# Only apply a change if it is at least 1ms out, to avoid noise (possibly should tighten this to 100 micros, or dependent on resolution of time library we're using)

The result is a method that costs around the same as a raw call to System.nanoTime() but gives pretty decent accuracy. Obviously any method that involves using nanos and calculating an offset from a method that takes ~7micros to return is going to have an inherent inaccuracy, but no more than the 7micros direct method call would itself, and the inaccuracy will be consistent given the jitter reduction I'm applying. At startup we also sample the offset 10k times, derive a 90%ile for elapsed time fetching the offset (we ignore future offsets we calculate that take more than twice this period to sample) and average all of those within the 90%ile.



> QueryState.getTimestamp() & FBUtilities.timestampMicros() reads current timestamp with System.currentTimeMillis() * 1000 instead of System.nanoTime() / 1000
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6106
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6106
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: DSE Cassandra 3.1, but also HEAD
>            Reporter: Christopher Smith
>            Assignee: Benedict
>            Priority: Minor
>              Labels: timestamps
>             Fix For: 2.1 beta2
>
>         Attachments: microtimstamp.patch, microtimstamp_random.patch, microtimstamp_random_rev2.patch
>
>
> I noticed this blog post: http://aphyr.com/posts/294-call-me-maybe-cassandra mentioned issues with millisecond rounding in timestamps and was able to reproduce the issue. If I specify a timestamp in a mutating query, I get microsecond precision, but if I don't, I get timestamps rounded to the nearest millisecond, at least for my first query on a given connection, which substantially increases the possibilities of collision.
> I believe I found the offending code, though I am by no means sure this is comprehensive. I think we probably need a fairly comprehensive replacement of all uses of System.currentTimeMillis() with System.nanoTime().



--
This message was sent by Atlassian JIRA
(v6.2#6252)