You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dennis Gearon <ge...@sbcglobal.net> on 2010/11/02 05:41:43 UTC

RE: Ensuring stable timestamp ordering

how about a timrstamp with either a GUID appended on  the end of it?


Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sun, 10/31/10, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:

> From: Toke Eskildsen <te...@statsbiblioteket.dk>
> Subject: RE: Ensuring stable timestamp ordering
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Date: Sunday, October 31, 2010, 12:18 PM
> Dennis Gearon [gearond@sbcglobal.net]
> wrote:
> > Even microseconds may not be enough on some really
> good, fast machine.
> 
> True, especially since the timer might not provide
> microsecond granularity although the returned value is in
> microseconds. However, an unique timestamp generator should
> keep track of the previous timestamp to guard against
> duplicates. Uniqueness can thus be guaranteed by waiting a
> bit or cheating on the decimals. With microseconds can
> produce 1 million timestamps / second. While I agree that
> duplicates within microseconds can occur on a fast machine,
> guaranteeing uniqueness by waiting should only be a
> performance problem when the number of duplicates is high.
> That's still a few years off, I think.
> 
> As Michael pointed out, using normal timestamps as unique
> IDs might not be such a great idea as it effectively locks
> index-building to a single JVM. By going the ugly route and
> expressing the time in nanos with only microsecond
> granularity and use the last 3 decimals for a builder ID
> this could be fixed. Not very clean though, as the contract
> is not expressed in the data themselves but must
> nevertheless be obeyed by all builders to avoid collisions.
> It also raises the question of who should assign the builder
> IDs. Not trivial in an anarchistic setup where new builders
> can be added by different controllers.
> 
> Pragmatists might use the PID % 1000 or similar for the
> builder ID as it does not require coordination, but this is
> where the Birthday Paradox hits us again: The chance of two
> processes on different machines having the same PID is 10%
> if just 15 machines are used (1% for 5 machines, 50% for 37
> machines). I don't like those odds and that's assuming that
> the PIDs will be randomly distributed, which they won't. It
> could be lowered by reserving more decimals for the salt,
> but then we would decrease the maximum amount of timestamps
> / second, still without guaranteed uniqueness. Guys a lot
> smarter than me has spend time on the unique ID problem and
> it's clearly not easy: Java's UUID takes up 128 bits.
> 
> - Toke

Re: Ensuring stable timestamp ordering

Posted by Dennis Gearon <ge...@sbcglobal.net>.
memory's cheap! (I know processing it is not' though )

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Toke Eskildsen <te...@statsbiblioteket.dk>
To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
Sent: Mon, November 1, 2010 11:45:34 PM
Subject: RE: Ensuring stable timestamp ordering

Dennis Gearon [gearond@sbcglobal.net] wrote:
> how about a timrstamp with either a GUID appended on  the end of it?

Since long (8 bytes) is the largest atomic type supported by Java, this would 
have to be represented as a String (or rather BytesRef) and would take up 4 + 32 
bytes + 2 * 4 bytes from the internal BytesRef-attributes + some extra overhead. 
That is quite a large memory penalty to ensure unique timestamps.

RE: Ensuring stable timestamp ordering

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Dennis Gearon [gearond@sbcglobal.net] wrote:
> how about a timrstamp with either a GUID appended on  the end of it?

Since long (8 bytes) is the largest atomic type supported by Java, this would have to be represented as a String (or rather BytesRef) and would take up 4 + 32 bytes + 2 * 4 bytes from the internal BytesRef-attributes + some extra overhead. That is quite a large memory penalty to ensure unique timestamps.