You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Tim Reilly <ti...@consultant.com> on 2004/03/02 02:39:32 UTC

RE: [id] UUID update

> I will look at this stuff carefully this weekend, but one thing that
> jumped out at me from your post above was that the "global lock" issue
> might be avoidable by putting more into the node identifier, i.e., build
> in a jvm identifier.  IIRC, this is essentially what tomcat when
> generating session ids to avoid collisions across jvms on the same host.
> Just something to think about.
>
> Phil
>

I'm back. I'm thinking about this suggestion... My original concern is both
with physical machines that run multiple jvm versions, as well as multiple
instances of the same jvm (such as in an application server that is
vertically clustered.)

I don't know of an identifier that I can get that uniquely identifies a jvm
instance, but there may be something (I'll dig around the tomcat code.)
Coincidently, I do iterate the System properties and then create an MD5 of
those concatenated with a random and a (new Object().hashCode()) to generate
an artificial node id in StateHelper (used in the InMemoryStateImpl). Keep
in mind, that *ideally* the Node.id is always the MAC address(s) though.

I think I submitted quite a bit to digest in the zip file. Something that
may make things easier all around might be if I start on creating patches
just for the VersionFour uuid generator (random uuid)? We can workout
naming, package structure, and how best to tie in the factory and
IdentifierUtils.
Afterwards, we can look at the decisions about the more complex/troublesome
version 1 generator.
If this sounds good to everyone I'll start a new thread around that?


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [id] UUID update

Posted by Tim Reilly <ti...@consultant.com>.
> I would attribute the "cryptographic quality" reference in section 4 as
> just referring to randomization.  Making the PRNG pluggable might
> be a good
> compromise solution.
>
> Phil

I think that sounds good, for the version 4 (random bytes) uuid (I think
that's what you meant? - version 1 uses MD5 of system info or the MAC
address.)

The other calls to SecureRandom should be changed now that I understand more
about Random randomness.
I can send patches if or unless anyone wants to beat me to it.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [id] UUID update

Posted by Phil Steitz <st...@yahoo.com>.
--- Tim Reilly <ti...@consultant.com> wrote:
> Hi Phil,
> 
> > Why, btw, did you think that we
> > needed to use SecureRandom?  Is there any expectation in the spec
> > that the
> > random data will be cryptographically secure?
> 
> Quoting
> http://www.ietf.org/internet-drafts/draft-mealling-uuid-urn-02.txt
> =============================================
> 4.5 Node IDs that do not identify the host
> 
>    This section describes how to generate a version 1 UUID if an IEEE
>    802 address is not available, or its use is not desired.
> 
>    One approach is to contact the IEEE and get a separate block of
>    addresses. At the time of writing, the application could be found at
>    [6], and the cost was US$550.
> 
>    A better solution is to obtain a 47-bit cryptographic quality random
>    number, and use it as the low 47 bits of the node ID, with the most
> =============================================
> 
> I took the last paragraph to mean use SecureRandom. I also took for
> granted
> that SecureRandom is "more" random than Random but haven't found
> authoritative advice on that, it seems to be true or well believed?
> Anyone
> know of documentation on the issue? The spec emphasizes the term "high
> quality random".

It all comes down to whether or not you care if a *sequence* of randomly
generated values is predictable. The default Random generator is not bad in
terms of dispersion (i.e., likelihood of collisions, number of matches
among bytes, etc.) but not "secure" in the sense that knowledge of previous
results has little or no value in predicting new values.  See
http://en.wikipedia.org/wiki/Cryptographically_secure_pseudo-random_number_generator
for a little more on this.   

> 
> As far as I understand everything in the specification concerning version
> 1 - the only real guarantee that the id will be *universally* unique is
> when
> using the MAC address since it's from a central authority (and time
> shifting
> backwards is still an issue even when a Mac is used but that the clock
> sequence will deal with it if the generator does the time check and has
> stateful information to know to change the sequence. I won't mention that
> some NIC manufactures may have recycled Mac addresses according some
> posts I
> saw also failures in the state persistence are possible.)

Yes.

> Using alterative node identifiers should be done in a way to make it
> highly/extremely unlikely of a duplicate, but still the
> potential/probability (though small) exists that different system
> generate
> the same node, time, and clock sequence. It also depends on usage,
> duplicates may be generated in systems that never interact so the
> probability of duplicates within a group system actors is really what we
> care about in reality (usually I guess.)

SecureRandom offers no special advantage there. Two byte arrays generated
by a SecureRandom are no less likely to collide than a pair generated by a
Random (at least not significantly -- though the exact probablilities no
doubt differ, both will be remarkably close to expected binomial
probabilities in practice)
> 
> I should mention that a security concern was raised about using the MAC
> address, when it will be exposed to other/outside systems. One side of
> the
> security concern is around privacy (the uuid can give away a time and a
> PC.)
> The other security issue I assume is if you've setup some sort of inbound
> firewall rules based on MAC addresses or other reasons not to publish
> your
> MAC's? I believe uuid.so and win32 CoCreateGUID both now generate version
> 4
> uuid's for that reason. Of course, the reality is that the version and
> implementation someone chooses should be guided by their requirements and
> how they are using it (between trusted secure/trusted systems or not.)
> 
> Sorry for going on and on .. in answering a simple question ..took the
> opportunity to share some of the research I'd got on the topic. Should
> and
> probably needs to be in the documentation.
> 

This is an important issue. I have never viewed GUIDs as in any way
"secure."  As you point out, the MAC address is not at all secure. Maybe we
should support both "secure" and "non-secure" version 1 GUIDs. 

The comment below in Section 7 suggests, however, that there is no
expectation of cryptographic security, so I guess I would favor the
"non-secure" approach:

 Security Considerations

   Do not assume that UUIDs are hard to guess; they should not be used
   as capabilities, for example.

I would attribute the "cryptographic quality" reference in section 4 as
just referring to randomization.  Making the PRNG pluggable might be a good
compromise solution.

Phil
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you�re looking for faster
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [id] UUID update

Posted by Tim Reilly <ti...@consultant.com>.
Hi Phil,

> Why, btw, did you think that we
> needed to use SecureRandom?  Is there any expectation in the spec
> that the
> random data will be cryptographically secure?

Quoting http://www.ietf.org/internet-drafts/draft-mealling-uuid-urn-02.txt
=============================================
4.5 Node IDs that do not identify the host

   This section describes how to generate a version 1 UUID if an IEEE
   802 address is not available, or its use is not desired.

   One approach is to contact the IEEE and get a separate block of
   addresses. At the time of writing, the application could be found at
   [6], and the cost was US$550.

   A better solution is to obtain a 47-bit cryptographic quality random
   number, and use it as the low 47 bits of the node ID, with the most
=============================================

I took the last paragraph to mean use SecureRandom. I also took for granted
that SecureRandom is "more" random than Random but haven't found
authoritative advice on that, it seems to be true or well believed? Anyone
know of documentation on the issue? The spec emphasizes the term "high
quality random".

As far as I understand everything in the specification concerning version
1 - the only real guarantee that the id will be *universally* unique is when
using the MAC address since it's from a central authority (and time shifting
backwards is still an issue even when a Mac is used but that the clock
sequence will deal with it if the generator does the time check and has
stateful information to know to change the sequence. I won't mention that
some NIC manufactures may have recycled Mac addresses according some posts I
saw also failures in the state persistence are possible.)
Using alterative node identifiers should be done in a way to make it
highly/extremely unlikely of a duplicate, but still the
potential/probability (though small) exists that different system generate
the same node, time, and clock sequence. It also depends on usage,
duplicates may be generated in systems that never interact so the
probability of duplicates within a group system actors is really what we
care about in reality (usually I guess.)

I should mention that a security concern was raised about using the MAC
address, when it will be exposed to other/outside systems. One side of the
security concern is around privacy (the uuid can give away a time and a PC.)
The other security issue I assume is if you've setup some sort of inbound
firewall rules based on MAC addresses or other reasons not to publish your
MAC's? I believe uuid.so and win32 CoCreateGUID both now generate version 4
uuid's for that reason. Of course, the reality is that the version and
implementation someone chooses should be guided by their requirements and
how they are using it (between trusted secure/trusted systems or not.)

Sorry for going on and on .. in answering a simple question ..took the
opportunity to share some of the research I'd got on the topic. Should and
probably needs to be in the documentation.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [id] UUID update

Posted by Phil Steitz <ph...@steitz.com>.
Tim Reilly wrote:
>>Phil Steitz wrote:
> 

> I imagine performance tests of anything using SecureRandom
> (VersionFourGenerator, and InMemoryStateImpl) may be dismal due to
> initialization. 

Not just initialization.  The calls are also *much* slower than Random.

> I think I did a static reference to the SecureRandom so it
> should be the first time that takes a while. 

Yes, the static reference is there.  Why, btw, did you think that we 
needed to use SecureRandom?  Is there any expectation in the spec that the 
random data will be cryptographically secure?

Phil
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [id] UUID update

Posted by Tim Reilly <ti...@consultant.com>.
> Phil Steitz wrote:
...
> than updating the Apache license to 2.0. This is a good start. We need to
> get a better feel for stability / performance and some more eyeballs on
> this code, so I thought it best to get it into CVS now, even if we decide
> to refactor / repackage down the road. Thanks for the contribution.

It is a lot easier to talk about when it's under source control. The more
eyes for review the better, and as you mention below documenting the "why's"
will be helpful to that end. Let me know your thoughts on additional tests.
One I wanted to make sure of was the writes, and finalize in
ReadWriteFileImpl all happen as expected.

I imagine performance tests of anything using SecureRandom
(VersionFourGenerator, and InMemoryStateImpl) may be dismal due to
initialization. I think I did a static reference to the SecureRandom so it
should be the first time that takes a while. Some other strategies can be
built on top of what's there, like burst generation into a queue if need be.

> Some minimal xdocs, or more complete package documentation
> describing the
> implementation choices made would be a good thing to add about now.  Most
> of this is in the code or mail archives, but it would be good to get it
> into the package docs or xdocs.
>

Certainly. Hopefully, I can get this done as soon as possible - probably
within the next few of days, sooner if I can.


> If you have not been following all of the commons-dev build
> stuff, you may
> have missed that you now need to co jakarta-commons-sandbox/sandbox-build
> to get the maven build to work.

I did manage to catch-up on most of the list. I really appreciate Mark and
other's effort doing this, I didn't want to have to checkout the entire
commons tree or have to think about selecting which bits to checkout aside
from [id] - so the change is welcome in my book...checking out as two
eclipse projects is much smoother. It also alleviates my need for the
commons.project.extendsUri=../sandbox-build/ which I was changing locally.
Not sure if you want to keep it or not?


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [id] UUID update

Posted by Phil Steitz <ph...@steitz.com>.
Tim Reilly wrote:
>>I will look at this stuff carefully this weekend, but one thing that
>>jumped out at me from your post above was that the "global lock" issue
>>might be avoidable by putting more into the node identifier, i.e., build
>>in a jvm identifier.  IIRC, this is essentially what tomcat when
>>generating session ids to avoid collisions across jvms on the same host.
>>Just something to think about.
>>
>>Phil
>>
> 
> 
> I'm back. I'm thinking about this suggestion... My original concern is both
> with physical machines that run multiple jvm versions, as well as multiple
> instances of the same jvm (such as in an application server that is
> vertically clustered.)
> 
> I don't know of an identifier that I can get that uniquely identifies a jvm
> instance, but there may be something (I'll dig around the tomcat code.)
> Coincidently, I do iterate the System properties and then create an MD5 of
> those concatenated with a random and a (new Object().hashCode()) to generate
> an artificial node id in StateHelper (used in the InMemoryStateImpl). Keep
> in mind, that *ideally* the Node.id is always the MAC address(s) though.

I investigated the tomcat code and concluded that my suggestion above is 
no better/different really than what you are doing and has the 
disadvantage of making the node id nonstandard (bad). AFAIK, there is no 
magical way to generate a guaranteed unique jvm identifier.

> I think I submitted quite a bit to digest in the zip file. Something that
> may make things easier all around might be if I start on creating patches
> just for the VersionFour uuid generator (random uuid)? We can workout
> naming, package structure, and how best to tie in the factory and
> IdentifierUtils.
> Afterwards, we can look at the decisions about the more complex/troublesome
> version 1 generator.
> If this sounds good to everyone I'll start a new thread around that?
> 
> 
I went ahead and committed the changes in the zip, with no changes other 
than updating the Apache license to 2.0. This is a good start. We need to 
get a better feel for stability / performance and some more eyeballs on 
this code, so I thought it best to get it into CVS now, even if we decide 
to refactor / repackage down the road. Thanks for the contribution.

Some minimial xdocs, or more complete package documentation describing the 
implementation choices made would be a good thing to add about now.  Most 
of this is in the code or mail archives, but it would be good to get it 
into the package docs or xdocs.

If you have not been following all of the commons-dev build stuff, you may 
have missed that you now need to co jakarta-commons-sandbox/sandbox-build 
to get the maven build to work.

Phil

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org