You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Tim Reilly <ti...@consultant.com> on 2004/01/11 05:34:41 UTC

[id] Need some help figuring this out

Below are some transact SQL (MS-SQL Server) statements that I ran in Query
Analyzer. I'm trying to figure out how it is that a UUID in character format
translates to the hexidecimal format SQLServer uses.

An example is:
String format is: B4F00409-CEF8-4822-802C-DEB20704C365
Hexidecimal format is: 0x0904F0B4F8CE2248802CDEB20704C365
I see a pattern, but I can't explain it. My first thought was that it's an
endian problem, but that doesn't seem to pan out. Anyone have any ideas?

The pattern I see is the last 2 fields are the same. But the high order
bytes / half are swapped
B4F00409-CEF8-4822-802C-DEB20704C365
  \ /     \/   \/  |||| ||||||||||||
  / \     /\   /\  |||| ||||||||||||
0904F0B4 F8CE 2248 802C DEB20704C365


//T-SQL Statements
DECLARE @tmpuuid uniqueidentifier
DECLARE @myuuid uniqueidentifier

SET @myuuid = newid()

print 'MyUUID String Format: '
print Convert(char(36), @myuuid)
print ''
print 'MyUUID Binary Format: '
print Convert(binary(16), @myuuid)
print ''
SET @tmpuuid = Convert(binary(16), @myuuid)
print 'tmpuuid converted back: '
print @tmpuuid

==OUTPUT==
MyUUID String Format:
B4F00409-CEF8-4822-802C-DEB20704C365

MyUUID Binary Format:
0x0904F0B4F8CE2248 802CDEB20704C365

tmpuuid converted back:
B4F00409-CEF8-4822-802C-DEB20704C365

== Reformatted ==
MyUUID String Format:
B4 F0 04 09 - CE F8- 48 22- 802C-DEB20704C365

MyUUID Binary Format:
0x 09 04 F0 B4 F8 CE 22 48  802C DEB20704C365

tmpuuid converted back:
B4F00409-CEF8-4822-802C-DEB20704C365


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [id] Need some help figuring this out

Posted by Phil Steitz <ph...@steitz.com>.
Hey Tim,

The implementation looks interesting.  So that we can discuss with code in 
front of us, I will commit this evening (er...morning, I guess it is). 
See interspersed.

Tim Reilly wrote:
> If I understand correctly the struct and the fact that its SqlServer on
> Intel P4 means I'm seeing little endian first in the DWORD and WORD fields?
> That makes the pattern intelligible:
> Type    Field   HexOctet
> DWORD   Data1 - [lsb0][lsb1][msb0][msb1]
> WORD    Data2 - [lsb][msb]
> WORD    Data3 - [lsb][msb]
> BYTE    Data4[8] - just bytes
> 
> I'm not sure what this should mean to o.a.c.id.uuid.UUID...since the UUID
> implementation I've worked out uses BigInteger to store the 128-bits, and I
> also allow a constructor from a BigInteger. I felt systems that wish to
> store the byte value and not the char value could benefit. (For example, to
> store in db2 I would use a the 16 byte value not the 36 byte string
> representation.) The IETF Draft spec says the 128 bit object should be
> composed of the various fields in network byte order, so it seems the
> numeric value returned from ms-sql is system specific? and the a.o.c.id.uuid
> implementation is ok by the spec?
> I'm going off of:
> http://www.ietf.org/internet-drafts/draft-mealling-uuid-urn-01.txt
> (Section: 4.1.2 Layout and byte order)

Interesting question.  The following statement in 4.1.2 sort of cryptic:

    "In the absence of explicit application or presentation protocol
    specification to the contrary, a UUID is encoded as a 128-bit object,
    as follows: the fields are encoded as 16 octets, with the sizes and
    order of the fields defined above, and with each field encoded with
    the Most Significant Byte first (this is known as network byte
    order)."

Section 3 of the spec provides BNF for canonical string representation of 
UUIDs for use as URNs and your implementation provides this as well.

I think that the spec clearly expects that implementors will provide 
binary representations and I agree that we should try to follow the layout 
in 4.1.2 exactly.  Using a BigInteger seems reasonable to me, but I am no 
expert on bit fiddling in Java.

> 
> I'm wondering if someone might try to construct an o.a.c.id.UUID from the
> hexidecimal value they pulled from MS-SQL? It may be a long shot, but
> perhaps a note in the documentation is needed?

Yes, we need to fully document how our impl works and state clearly all 
over the place that the hex representations that we expect are as 
described in section 3 of the spec.  If MS SQL Server produces string 
"UUIDs" that do not conform to the canonical URN spec in Sect. 3, then 
these can't be used to construct UUIDs without reformatting.

> e.g.d
>           String fromMSSQL = "0904F0B4F8CE2248802CDEB20704C365";
>           BigInteger bInt = new BigInteger(fromMSSQL, 16);
>           UUID myUUID = new UUID(bInt);
>           System.out.println(myUUID.toString());
> 	Outputs:
>           0904f0b4-f8ce-2248-802c-deb20704c365
> 	and Not B4F00409-CEF8-4822-802C-DEB20704C365
> and would not produce the same GUID from ms-sql to o.a.c.id.uuid.UUID
> because the hexidecimal representation from ms-sql is this way. Another
> option might be a new method
> parseMSSQLHexidecimal(String)?

I would be -0 to adding MS SQL-specific convenience methods, with the "-" 
because we would have to maintain these methods and their implementation 
might have to change each time a new server version is released.

Phil
> Any thoughts?
> 
> PS> Thanks Martin... I would not have figured out that out.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [id] Need some help figuring this out

Posted by Tim Reilly <ti...@consultant.com>.
If I understand correctly the struct and the fact that its SqlServer on
Intel P4 means I'm seeing little endian first in the DWORD and WORD fields?
That makes the pattern intelligible:
Type    Field   HexOctet
DWORD   Data1 - [lsb0][lsb1][msb0][msb1]
WORD    Data2 - [lsb][msb]
WORD    Data3 - [lsb][msb]
BYTE    Data4[8] - just bytes

I'm not sure what this should mean to o.a.c.id.uuid.UUID...since the UUID
implementation I've worked out uses BigInteger to store the 128-bits, and I
also allow a constructor from a BigInteger. I felt systems that wish to
store the byte value and not the char value could benefit. (For example, to
store in db2 I would use a the 16 byte value not the 36 byte string
representation.) The IETF Draft spec says the 128 bit object should be
composed of the various fields in network byte order, so it seems the
numeric value returned from ms-sql is system specific? and the a.o.c.id.uuid
implementation is ok by the spec?
I'm going off of:
http://www.ietf.org/internet-drafts/draft-mealling-uuid-urn-01.txt
(Section: 4.1.2 Layout and byte order)

I'm wondering if someone might try to construct an o.a.c.id.UUID from the
hexidecimal value they pulled from MS-SQL? It may be a long shot, but
perhaps a note in the documentation is needed?
e.g.
          String fromMSSQL = "0904F0B4F8CE2248802CDEB20704C365";
          BigInteger bInt = new BigInteger(fromMSSQL, 16);
          UUID myUUID = new UUID(bInt);
          System.out.println(myUUID.toString());
	Outputs:
          0904f0b4-f8ce-2248-802c-deb20704c365
	and Not B4F00409-CEF8-4822-802C-DEB20704C365
and would not produce the same GUID from ms-sql to o.a.c.id.uuid.UUID
because the hexidecimal representation from ms-sql is this way. Another
option might be a new method
parseMSSQLHexidecimal(String)?

Any thoughts?

PS> Thanks Martin... I would not have figured out that out.



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [id] Need some help figuring this out

Posted by Martin Cooper <ma...@apache.org>.
On Sat, 10 Jan 2004, Tim Reilly wrote:

> Below are some transact SQL (MS-SQL Server) statements that I ran in Query
> Analyzer. I'm trying to figure out how it is that a UUID in character format
> translates to the hexidecimal format SQLServer uses.
>
> An example is:
> String format is: B4F00409-CEF8-4822-802C-DEB20704C365
> Hexidecimal format is: 0x0904F0B4F8CE2248802CDEB20704C365
> I see a pattern, but I can't explain it. My first thought was that it's an
> endian problem, but that doesn't seem to pan out. Anyone have any ideas?

It looks odd, doesn't it? But it all makes sense when you see the C/C++
type definition, which looks something like this:

typedef struct _GUID {
    DWORD   Data1;
    WORD    Data2;
    WORD    Data3;
    BYTE    Data4[8];
} GUID;

A DWORD is a 32-bit unsigned value, and a WORD is a 16-bit unsigned value.

The pattern you see is exactly right, and due to the fact that the Data1,
Data2 and Data3 fields have their bytes organised according to the types
specified in the typedef above.

--
Martin Cooper


>
> The pattern I see is the last 2 fields are the same. But the high order
> bytes / half are swapped
> B4F00409-CEF8-4822-802C-DEB20704C365
>   \ /     \/   \/  |||| ||||||||||||
>   / \     /\   /\  |||| ||||||||||||
> 0904F0B4 F8CE 2248 802C DEB20704C365
>
>
> //T-SQL Statements
> DECLARE @tmpuuid uniqueidentifier
> DECLARE @myuuid uniqueidentifier
>
> SET @myuuid = newid()
>
> print 'MyUUID String Format: '
> print Convert(char(36), @myuuid)
> print ''
> print 'MyUUID Binary Format: '
> print Convert(binary(16), @myuuid)
> print ''
> SET @tmpuuid = Convert(binary(16), @myuuid)
> print 'tmpuuid converted back: '
> print @tmpuuid
>
> ==OUTPUT==
> MyUUID String Format:
> B4F00409-CEF8-4822-802C-DEB20704C365
>
> MyUUID Binary Format:
> 0x0904F0B4F8CE2248 802CDEB20704C365
>
> tmpuuid converted back:
> B4F00409-CEF8-4822-802C-DEB20704C365
>
> == Reformatted ==
> MyUUID String Format:
> B4 F0 04 09 - CE F8- 48 22- 802C-DEB20704C365
>
> MyUUID Binary Format:
> 0x 09 04 F0 B4 F8 CE 22 48  802C DEB20704C365
>
> tmpuuid converted back:
> B4F00409-CEF8-4822-802C-DEB20704C365
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org