You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Michael Segel <mi...@hotmail.com> on 2010/03/14 18:41:04 UTC

UUID as key wuz: RE: worth choosing the shortest possible column names/keys?


UUID overkill?
Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely not 'overkill' if all you want the key to do is to guarantee uniqueness. 

Very easy to generate and extremely easy to use. You can even hash it and create version 5 UUIDs.

I don't understand why you'd want to try and generate an 8 byte (you said 8 character, assuming you meant latin-1 characterset), when you have a standard way of doing it already. 8 byte vs 16 byte? C'mon....really?

JMHO

-Mike
 
> Date: Sat, 13 Mar 2010 09:01:38 +0100
> Subject: Re: worth choosing the shortest possible column names/keys?
> From: timrobertson100@gmail.com
> To: hbase-user@hadoop.apache.org
> 
> Along similar lines... (sorry for hijacking thread)
> 
> I assume that this is even more applicable for key choice given the way keys
> participate in indexes?  I have been using UUID, but it is way overkill for
> my needs.  What are others using?  Is there convenient way of doing (e.g.) 8
> characters strings?
> 

 		 	   		  
_________________________________________________________________
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
http://clk.atdmt.com/GBL/go/210850552/direct/01/

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

Posted by Ryan Rawson <ry...@gmail.com>.

Everything you say is totally true.

One last comment: if your update rate is lowish, and the IDs might
have some meaning, you might be better served by a counter.  eg:
userids (max value=6 billion ;-))  Or something else that might end up
needing to be human semi-readable.

-ryan

On Mon, Mar 15, 2010 at 4:11 PM, Michael Segel
<mi...@hotmail.com> wrote:
>
>
>
>> Date: Mon, 15 Mar 2010 08:15:10 +0100
>> Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible column         names/keys?
>> From: timrobertson100@gmail.com
>> To: hbase-user@hadoop.apache.org
>
>>
>> Sure, understood.  UUID aims to be globally unique, whereas I am only
>> looking for in cluster uniqueness across a couple billion items, but an
>> algorithm that allows ID minting by machines in parallel.
>>
> And if you use a serial counter. You have a single counter and a single point of failure, or a point of contention.
> If you're running a hadoop/mapreduce job and each node inserts in to HBase as they run, then you have to coordinate counter access.
>
> Using UUID, you don't have that problem. Of course, you don't have a sequence that you would using a counter.
>
>
> _________________________________________________________________
> Hotmail: Trusted email with powerful SPAM protection.
> http://clk.atdmt.com/GBL/go/210850553/direct/01/

RE: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

Posted by Michael Segel <mi...@hotmail.com>.



> Date: Mon, 15 Mar 2010 08:15:10 +0100
> Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible column 	names/keys?
> From: timrobertson100@gmail.com
> To: hbase-user@hadoop.apache.org

> 
> Sure, understood.  UUID aims to be globally unique, whereas I am only
> looking for in cluster uniqueness across a couple billion items, but an
> algorithm that allows ID minting by machines in parallel.
> 
And if you use a serial counter. You have a single counter and a single point of failure, or a point of contention. 
If you're running a hadoop/mapreduce job and each node inserts in to HBase as they run, then you have to coordinate counter access.

Using UUID, you don't have that problem. Of course, you don't have a sequence that you would using a counter.

 		 	   		  
_________________________________________________________________
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/210850553/direct/01/

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

Posted by tsuna <ts...@gmail.com>.

On Mon, Mar 15, 2010 at 12:21 AM, Tim Robertson
<ti...@gmail.com> wrote:
> How do you use incrementColumnValue
> To generate a row key please?

You need a "special" row to act as a counter.  This row will typically
contain only a single cell, which stores the counter.  I like to use
the row key { 0 } (a byte array made of a single zeroed byte) for this
special row.  Then you just do an incrementColumnValue on that cell,
and the long you get back, you can transform to a byte array and here
you go, you got your row key.  The counter row is only special in the
sense that it doesn't store actual data, but only a small piece of
meta data that the counter is.

Beware that this entails that all the new rows are always appended at
the end of the same region.  If your workload's performance depends on
your ability to create a large number of rows per second, then this
technique may prove inefficient as you may create a hot spot on the
one region that is being written to.

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

Posted by Ryan Rawson <ry...@gmail.com>.

You certainly can use this call to generate a row id - it works just
like a sequence (from oracle/sql land) object.

I think some people around are using it to generate row ids. The code
should ensure that every number is unique and monotonically
increasing.

-ryan

On Mon, Mar 15, 2010 at 1:21 AM, Tim Robertson
<ti...@gmail.com> wrote:
> Thanks Ryan, sounds ideal
>
> How do you use
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[],%20byte[],%20byte[],%20long)
>
> To generate a row key please?
>
> Thanks
> Tim
>
>
>
>
>
> On Mon, Mar 15, 2010 at 9:12 AM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> You can use incrementColumnValue to generate sequential numbers.  The
>> call is atomic and fast.  It supports thousands of calls/second in my
>> testing.
>>
>> -ryan
>>
>> On Mon, Mar 15, 2010 at 12:15 AM, Tim Robertson
>> <ti...@gmail.com> wrote:
>> >>
>> >> Maybe I'm missing something but the UUID is an artificial key, its used
>> to
>> >> guarantee uniqueness and in this case you're using it as part of a
>> key,value
>> >> pair.
>> >>
>> >
>> > Sure, understood.  UUID aims to be globally unique, whereas I am only
>> > looking for in cluster uniqueness across a couple billion items, but an
>> > algorithm that allows ID minting by machines in parallel.
>> >
>> >
>> >> So why are you storing it in a Lucene index as the value?
>> >>
>> >
>> > Because I have various search indexes to the row using combinations of
>> > fields from the row.  I want the whole row accessible in the search
>> results,
>> > so I store the row key only (the row content is way to big for Lucene).
>> >  Lucene handles the search providing the Keys, and then the rows are
>> pulled
>> > and transformed while streaming out in the results.
>> >
>> >
>> >> Look, the benefits of using the UUID definitely outweigh wrapping your
>> own
>> >> solution in 8bytes, even in memory caches.
>> >> (Are you only storing values that are 16 bytes in length, or something
>> much
>> >> larger?)
>> >
>> >
>> > The values are much much larger (100s - 1000s bytes) but they aren't
>> going
>> > in to any in-memory structures.
>> >
>> >
>> >
>> >> > Date: Sun, 14 Mar 2010 19:09:48 +0100
>> >> > Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible
>> >> column         names/keys?
>> >> > From: timrobertson100@gmail.com
>> >> > To: hbase-user@hadoop.apache.org
>> >> >
>> >> > Well I could well be wrong, but my understanding is that there are
>> memory
>> >> > mapped index files using the key, so key choice would come in to play
>> for
>> >> > memory requirements here.  For secondary indexes, it has to be a
>> factor
>> >> for
>> >> > memory requirements- halving the size of the data you need to get in
>> >> memory
>> >> > must be a good thing.  I am also building Lucene indexes storing only
>> >> this
>> >> > key, so it influences their size a fair amount too.
>> >> >
>> >> > I know for sure Mysql (Myisam) btree index size is greatly affected by
>> >> the
>> >> > size of the Numeric types.  They are more complicated that my
>> >> understanding
>> >> > of HBase indexing, but the same principles apply (if it ain't in
>> memory
>> >> then
>> >> > you're into disk seeking).
>> >> >
>> >> >
>> >> >
>> >> > On Sun, Mar 14, 2010 at 6:41 PM, Michael Segel <
>> >> michael_segel@hotmail.com>wrote:
>> >> >
>> >> > >
>> >> > >
>> >> > > UUID overkill?
>> >> > > Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely
>> >> not
>> >> > > 'overkill' if all you want the key to do is to guarantee uniqueness.
>> >> > >
>> >> > > Very easy to generate and extremely easy to use. You can even hash
>> it
>> >> and
>> >> > > create version 5 UUIDs.
>> >> > >
>> >> > > I don't understand why you'd want to try and generate an 8 byte (you
>> >> said 8
>> >> > > character, assuming you meant latin-1 characterset), when you have a
>> >> > > standard way of doing it already. 8 byte vs 16 byte?
>> C'mon....really?
>> >> > >
>> >> > > JMHO
>> >> > >
>> >> > > -Mike
>> >> > >
>> >> > > > Date: Sat, 13 Mar 2010 09:01:38 +0100
>> >> > > > Subject: Re: worth choosing the shortest possible column
>> names/keys?
>> >> > > > From: timrobertson100@gmail.com
>> >> > > > To: hbase-user@hadoop.apache.org
>> >> > > >
>> >> > > > Along similar lines... (sorry for hijacking thread)
>> >> > > >
>> >> > > > I assume that this is even more applicable for key choice given
>> the
>> >> way
>> >> > > keys
>> >> > > > participate in indexes?  I have been using UUID, but it is way
>> >> overkill
>> >> > > for
>> >> > > > my needs.  What are others using?  Is there convenient way of
>> doing
>> >> > > (e.g.) 8
>> >> > > > characters strings?
>> >> > > >
>> >> > >
>> >> > >
>> >> > > _________________________________________________________________
>> >> > > Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
>> >> > > http://clk.atdmt.com/GBL/go/210850552/direct/01/
>> >> > >
>> >>
>> >> _________________________________________________________________
>> >> Hotmail is redefining busy with tools for the New Busy. Get more from
>> your
>> >> inbox.
>> >>
>> >>
>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2
>> >>
>> >
>>
>

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

Posted by Tim Robertson <ti...@gmail.com>.

Thanks Ryan, sounds ideal

How do you use
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[],%20byte[],%20byte[],%20long)

To generate a row key please?

Thanks
Tim





On Mon, Mar 15, 2010 at 9:12 AM, Ryan Rawson <ry...@gmail.com> wrote:

> You can use incrementColumnValue to generate sequential numbers.  The
> call is atomic and fast.  It supports thousands of calls/second in my
> testing.
>
> -ryan
>
> On Mon, Mar 15, 2010 at 12:15 AM, Tim Robertson
> <ti...@gmail.com> wrote:
> >>
> >> Maybe I'm missing something but the UUID is an artificial key, its used
> to
> >> guarantee uniqueness and in this case you're using it as part of a
> key,value
> >> pair.
> >>
> >
> > Sure, understood.  UUID aims to be globally unique, whereas I am only
> > looking for in cluster uniqueness across a couple billion items, but an
> > algorithm that allows ID minting by machines in parallel.
> >
> >
> >> So why are you storing it in a Lucene index as the value?
> >>
> >
> > Because I have various search indexes to the row using combinations of
> > fields from the row.  I want the whole row accessible in the search
> results,
> > so I store the row key only (the row content is way to big for Lucene).
> >  Lucene handles the search providing the Keys, and then the rows are
> pulled
> > and transformed while streaming out in the results.
> >
> >
> >> Look, the benefits of using the UUID definitely outweigh wrapping your
> own
> >> solution in 8bytes, even in memory caches.
> >> (Are you only storing values that are 16 bytes in length, or something
> much
> >> larger?)
> >
> >
> > The values are much much larger (100s - 1000s bytes) but they aren't
> going
> > in to any in-memory structures.
> >
> >
> >
> >> > Date: Sun, 14 Mar 2010 19:09:48 +0100
> >> > Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible
> >> column         names/keys?
> >> > From: timrobertson100@gmail.com
> >> > To: hbase-user@hadoop.apache.org
> >> >
> >> > Well I could well be wrong, but my understanding is that there are
> memory
> >> > mapped index files using the key, so key choice would come in to play
> for
> >> > memory requirements here.  For secondary indexes, it has to be a
> factor
> >> for
> >> > memory requirements- halving the size of the data you need to get in
> >> memory
> >> > must be a good thing.  I am also building Lucene indexes storing only
> >> this
> >> > key, so it influences their size a fair amount too.
> >> >
> >> > I know for sure Mysql (Myisam) btree index size is greatly affected by
> >> the
> >> > size of the Numeric types.  They are more complicated that my
> >> understanding
> >> > of HBase indexing, but the same principles apply (if it ain't in
> memory
> >> then
> >> > you're into disk seeking).
> >> >
> >> >
> >> >
> >> > On Sun, Mar 14, 2010 at 6:41 PM, Michael Segel <
> >> michael_segel@hotmail.com>wrote:
> >> >
> >> > >
> >> > >
> >> > > UUID overkill?
> >> > > Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely
> >> not
> >> > > 'overkill' if all you want the key to do is to guarantee uniqueness.
> >> > >
> >> > > Very easy to generate and extremely easy to use. You can even hash
> it
> >> and
> >> > > create version 5 UUIDs.
> >> > >
> >> > > I don't understand why you'd want to try and generate an 8 byte (you
> >> said 8
> >> > > character, assuming you meant latin-1 characterset), when you have a
> >> > > standard way of doing it already. 8 byte vs 16 byte?
> C'mon....really?
> >> > >
> >> > > JMHO
> >> > >
> >> > > -Mike
> >> > >
> >> > > > Date: Sat, 13 Mar 2010 09:01:38 +0100
> >> > > > Subject: Re: worth choosing the shortest possible column
> names/keys?
> >> > > > From: timrobertson100@gmail.com
> >> > > > To: hbase-user@hadoop.apache.org
> >> > > >
> >> > > > Along similar lines... (sorry for hijacking thread)
> >> > > >
> >> > > > I assume that this is even more applicable for key choice given
> the
> >> way
> >> > > keys
> >> > > > participate in indexes?  I have been using UUID, but it is way
> >> overkill
> >> > > for
> >> > > > my needs.  What are others using?  Is there convenient way of
> doing
> >> > > (e.g.) 8
> >> > > > characters strings?
> >> > > >
> >> > >
> >> > >
> >> > > _________________________________________________________________
> >> > > Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
> >> > > http://clk.atdmt.com/GBL/go/210850552/direct/01/
> >> > >
> >>
> >> _________________________________________________________________
> >> Hotmail is redefining busy with tools for the New Busy. Get more from
> your
> >> inbox.
> >>
> >>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2
> >>
> >
>

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

Posted by Ryan Rawson <ry...@gmail.com>.

You can use incrementColumnValue to generate sequential numbers.  The
call is atomic and fast.  It supports thousands of calls/second in my
testing.

-ryan

On Mon, Mar 15, 2010 at 12:15 AM, Tim Robertson
<ti...@gmail.com> wrote:
>>
>> Maybe I'm missing something but the UUID is an artificial key, its used to
>> guarantee uniqueness and in this case you're using it as part of a key,value
>> pair.
>>
>
> Sure, understood.  UUID aims to be globally unique, whereas I am only
> looking for in cluster uniqueness across a couple billion items, but an
> algorithm that allows ID minting by machines in parallel.
>
>
>> So why are you storing it in a Lucene index as the value?
>>
>
> Because I have various search indexes to the row using combinations of
> fields from the row.  I want the whole row accessible in the search results,
> so I store the row key only (the row content is way to big for Lucene).
>  Lucene handles the search providing the Keys, and then the rows are pulled
> and transformed while streaming out in the results.
>
>
>> Look, the benefits of using the UUID definitely outweigh wrapping your own
>> solution in 8bytes, even in memory caches.
>> (Are you only storing values that are 16 bytes in length, or something much
>> larger?)
>
>
> The values are much much larger (100s - 1000s bytes) but they aren't going
> in to any in-memory structures.
>
>
>
>> > Date: Sun, 14 Mar 2010 19:09:48 +0100
>> > Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible
>> column         names/keys?
>> > From: timrobertson100@gmail.com
>> > To: hbase-user@hadoop.apache.org
>> >
>> > Well I could well be wrong, but my understanding is that there are memory
>> > mapped index files using the key, so key choice would come in to play for
>> > memory requirements here.  For secondary indexes, it has to be a factor
>> for
>> > memory requirements- halving the size of the data you need to get in
>> memory
>> > must be a good thing.  I am also building Lucene indexes storing only
>> this
>> > key, so it influences their size a fair amount too.
>> >
>> > I know for sure Mysql (Myisam) btree index size is greatly affected by
>> the
>> > size of the Numeric types.  They are more complicated that my
>> understanding
>> > of HBase indexing, but the same principles apply (if it ain't in memory
>> then
>> > you're into disk seeking).
>> >
>> >
>> >
>> > On Sun, Mar 14, 2010 at 6:41 PM, Michael Segel <
>> michael_segel@hotmail.com>wrote:
>> >
>> > >
>> > >
>> > > UUID overkill?
>> > > Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely
>> not
>> > > 'overkill' if all you want the key to do is to guarantee uniqueness.
>> > >
>> > > Very easy to generate and extremely easy to use. You can even hash it
>> and
>> > > create version 5 UUIDs.
>> > >
>> > > I don't understand why you'd want to try and generate an 8 byte (you
>> said 8
>> > > character, assuming you meant latin-1 characterset), when you have a
>> > > standard way of doing it already. 8 byte vs 16 byte? C'mon....really?
>> > >
>> > > JMHO
>> > >
>> > > -Mike
>> > >
>> > > > Date: Sat, 13 Mar 2010 09:01:38 +0100
>> > > > Subject: Re: worth choosing the shortest possible column names/keys?
>> > > > From: timrobertson100@gmail.com
>> > > > To: hbase-user@hadoop.apache.org
>> > > >
>> > > > Along similar lines... (sorry for hijacking thread)
>> > > >
>> > > > I assume that this is even more applicable for key choice given the
>> way
>> > > keys
>> > > > participate in indexes?  I have been using UUID, but it is way
>> overkill
>> > > for
>> > > > my needs.  What are others using?  Is there convenient way of doing
>> > > (e.g.) 8
>> > > > characters strings?
>> > > >
>> > >
>> > >
>> > > _________________________________________________________________
>> > > Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
>> > > http://clk.atdmt.com/GBL/go/210850552/direct/01/
>> > >
>>
>> _________________________________________________________________
>> Hotmail is redefining busy with tools for the New Busy. Get more from your
>> inbox.
>>
>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2
>>
>

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

Posted by Tim Robertson <ti...@gmail.com>.

>
> Maybe I'm missing something but the UUID is an artificial key, its used to
> guarantee uniqueness and in this case you're using it as part of a key,value
> pair.
>

Sure, understood.  UUID aims to be globally unique, whereas I am only
looking for in cluster uniqueness across a couple billion items, but an
algorithm that allows ID minting by machines in parallel.


> So why are you storing it in a Lucene index as the value?
>

Because I have various search indexes to the row using combinations of
fields from the row.  I want the whole row accessible in the search results,
so I store the row key only (the row content is way to big for Lucene).
 Lucene handles the search providing the Keys, and then the rows are pulled
and transformed while streaming out in the results.


> Look, the benefits of using the UUID definitely outweigh wrapping your own
> solution in 8bytes, even in memory caches.
> (Are you only storing values that are 16 bytes in length, or something much
> larger?)


The values are much much larger (100s - 1000s bytes) but they aren't going
in to any in-memory structures.



> > Date: Sun, 14 Mar 2010 19:09:48 +0100
> > Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible
> column         names/keys?
> > From: timrobertson100@gmail.com
> > To: hbase-user@hadoop.apache.org
> >
> > Well I could well be wrong, but my understanding is that there are memory
> > mapped index files using the key, so key choice would come in to play for
> > memory requirements here.  For secondary indexes, it has to be a factor
> for
> > memory requirements- halving the size of the data you need to get in
> memory
> > must be a good thing.  I am also building Lucene indexes storing only
> this
> > key, so it influences their size a fair amount too.
> >
> > I know for sure Mysql (Myisam) btree index size is greatly affected by
> the
> > size of the Numeric types.  They are more complicated that my
> understanding
> > of HBase indexing, but the same principles apply (if it ain't in memory
> then
> > you're into disk seeking).
> >
> >
> >
> > On Sun, Mar 14, 2010 at 6:41 PM, Michael Segel <
> michael_segel@hotmail.com>wrote:
> >
> > >
> > >
> > > UUID overkill?
> > > Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely
> not
> > > 'overkill' if all you want the key to do is to guarantee uniqueness.
> > >
> > > Very easy to generate and extremely easy to use. You can even hash it
> and
> > > create version 5 UUIDs.
> > >
> > > I don't understand why you'd want to try and generate an 8 byte (you
> said 8
> > > character, assuming you meant latin-1 characterset), when you have a
> > > standard way of doing it already. 8 byte vs 16 byte? C'mon....really?
> > >
> > > JMHO
> > >
> > > -Mike
> > >
> > > > Date: Sat, 13 Mar 2010 09:01:38 +0100
> > > > Subject: Re: worth choosing the shortest possible column names/keys?
> > > > From: timrobertson100@gmail.com
> > > > To: hbase-user@hadoop.apache.org
> > > >
> > > > Along similar lines... (sorry for hijacking thread)
> > > >
> > > > I assume that this is even more applicable for key choice given the
> way
> > > keys
> > > > participate in indexes?  I have been using UUID, but it is way
> overkill
> > > for
> > > > my needs.  What are others using?  Is there convenient way of doing
> > > (e.g.) 8
> > > > characters strings?
> > > >
> > >
> > >
> > > _________________________________________________________________
> > > Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
> > > http://clk.atdmt.com/GBL/go/210850552/direct/01/
> > >
>
> _________________________________________________________________
> Hotmail is redefining busy with tools for the New Busy. Get more from your
> inbox.
>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2
>

RE: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

Posted by Michael Segel <mi...@hotmail.com>.


Maybe I'm missing something but the UUID is an artificial key, its used to guarantee uniqueness and in this case you're using it as part of a key,value pair.

So why are you storing it in a Lucene index as the value?

Look, the benefits of using the UUID definitely outweigh wrapping your own solution in 8bytes, even in memory caches.
(Are you only storing values that are 16 bytes in length, or something much larger?) 


> Date: Sun, 14 Mar 2010 19:09:48 +0100
> Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible column 	names/keys?
> From: timrobertson100@gmail.com
> To: hbase-user@hadoop.apache.org
> 
> Well I could well be wrong, but my understanding is that there are memory
> mapped index files using the key, so key choice would come in to play for
> memory requirements here.  For secondary indexes, it has to be a factor for
> memory requirements- halving the size of the data you need to get in memory
> must be a good thing.  I am also building Lucene indexes storing only this
> key, so it influences their size a fair amount too.
> 
> I know for sure Mysql (Myisam) btree index size is greatly affected by the
> size of the Numeric types.  They are more complicated that my understanding
> of HBase indexing, but the same principles apply (if it ain't in memory then
> you're into disk seeking).
> 
> 
> 
> On Sun, Mar 14, 2010 at 6:41 PM, Michael Segel <mi...@hotmail.com>wrote:
> 
> >
> >
> > UUID overkill?
> > Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely not
> > 'overkill' if all you want the key to do is to guarantee uniqueness.
> >
> > Very easy to generate and extremely easy to use. You can even hash it and
> > create version 5 UUIDs.
> >
> > I don't understand why you'd want to try and generate an 8 byte (you said 8
> > character, assuming you meant latin-1 characterset), when you have a
> > standard way of doing it already. 8 byte vs 16 byte? C'mon....really?
> >
> > JMHO
> >
> > -Mike
> >
> > > Date: Sat, 13 Mar 2010 09:01:38 +0100
> > > Subject: Re: worth choosing the shortest possible column names/keys?
> > > From: timrobertson100@gmail.com
> > > To: hbase-user@hadoop.apache.org
> > >
> > > Along similar lines... (sorry for hijacking thread)
> > >
> > > I assume that this is even more applicable for key choice given the way
> > keys
> > > participate in indexes?  I have been using UUID, but it is way overkill
> > for
> > > my needs.  What are others using?  Is there convenient way of doing
> > (e.g.) 8
> > > characters strings?
> > >
> >
> >
> > _________________________________________________________________
> > Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
> > http://clk.atdmt.com/GBL/go/210850552/direct/01/
> >
 		 	   		  
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

Posted by Tim Robertson <ti...@gmail.com>.

Well I could well be wrong, but my understanding is that there are memory
mapped index files using the key, so key choice would come in to play for
memory requirements here.  For secondary indexes, it has to be a factor for
memory requirements- halving the size of the data you need to get in memory
must be a good thing.  I am also building Lucene indexes storing only this
key, so it influences their size a fair amount too.

I know for sure Mysql (Myisam) btree index size is greatly affected by the
size of the Numeric types.  They are more complicated that my understanding
of HBase indexing, but the same principles apply (if it ain't in memory then
you're into disk seeking).

On Sun, Mar 14, 2010 at 6:41 PM, Michael Segel <mi...@hotmail.com>wrote:

>
>
> UUID overkill?
> Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely not
> 'overkill' if all you want the key to do is to guarantee uniqueness.
>
> Very easy to generate and extremely easy to use. You can even hash it and
> create version 5 UUIDs.
>
> I don't understand why you'd want to try and generate an 8 byte (you said 8
> character, assuming you meant latin-1 characterset), when you have a
> standard way of doing it already. 8 byte vs 16 byte? C'mon....really?
>
> JMHO
>
> -Mike
>
> > Date: Sat, 13 Mar 2010 09:01:38 +0100
> > Subject: Re: worth choosing the shortest possible column names/keys?
> > From: timrobertson100@gmail.com
> > To: hbase-user@hadoop.apache.org
> >
> > Along similar lines... (sorry for hijacking thread)
> >
> > I assume that this is even more applicable for key choice given the way
> keys
> > participate in indexes?  I have been using UUID, but it is way overkill
> for
> > my needs.  What are others using?  Is there convenient way of doing
> (e.g.) 8
> > characters strings?
> >
>
>
> _________________________________________________________________
> Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
> http://clk.atdmt.com/GBL/go/210850552/direct/01/
>