You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rita <rm...@gmail.com> on 2012/09/15 17:09:16 UTC

lookup table

I am debating if a lookup table would help my situation.

I have a bunch of codes which map with timestamp (unsigned int). The codes
look like this

AA4
AAA5
A21
A4
...
Z435

The size range from 1 character to 4 characters (1 to 4 bytes,
respectively).


Would adding a lookup table for all my codes help in reducing space? If so,
what would be the best way to hash something like this?




-- 
--- Get your facts first, then you can distort them as you please.--

Re: lookup table

Posted by Tom Brown <to...@gmail.com>.
If there are 9k possible entries in the lookup table, in order to achieve
space savings, the keys will need to be 1 or 2 bytes. For simplicity, let's
say you go with the 2 byte version. For 30 billion cells you will save 2
bytes per cell at best (from 4 bytes to 2) for a total savings of 60Gb and
at worst it will take more size because the lookup keys will be longer than
the actual value being looked up.

The added complexity of a lookup table would not make that savings worth it
to me, but you know your data best.

Just my $0.02

--Tom

On Sunday, September 16, 2012, Rita <rm...@gmail.com> wrote:
> Yes, I am trying to save on disk space because of limited resouces and the
> table will be around 30 billion rows.
>
> The lookup table itself will be around 9k rows so its not too bad. A
> character's range will be from 1 to 4.
>
> I suppose I really should worry about it too much.
>
>
>
>
>
> On Sun, Sep 16, 2012 at 6:16 PM, Stack <st...@duboce.net> wrote:
>
>> On Sat, Sep 15, 2012 at 8:09 AM, Rita <rm...@gmail.com> wrote:
>> > I am debating if a lookup table would help my situation.
>> >
>> > I have a bunch of codes which map with timestamp (unsigned int). The
>> codes
>> > look like this
>> >
>> > AA4
>> > AAA5
>> > A21
>> > A4
>> > ...
>> > Z435
>> >
>> > The size range from 1 character to 4 characters (1 to 4 bytes,
>> > respectively).
>> >
>> >
>> > Would adding a lookup table for all my codes help in reducing space? If
>> so,
>> > what would be the best way to hash something like this?
>> >
>>
>> You are trying to save on disk space?  You could make your keys binary
>> four bytes max null prefixed if < 4 characters?  Why are you trying to
>> save disk space?  You want a lookup table so you can have a code that
>> is smaller than that of the 1-4 character codes?
>>
>> St.Ack
>> St.Ack
>>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>

Re: lookup table

Posted by Stack <st...@duboce.net>.
On Sun, Sep 16, 2012 at 4:27 PM, Rita <rm...@gmail.com> wrote:
> Yes, I am trying to save on disk space because of limited resouces and the
> table will be around 30 billion rows.
>
> The lookup table itself will be around 9k rows so its not too bad. A
> character's range will be from 1 to 4.
>
> I suppose I really should worry about it too much.
>

I'd agree (See Tom Brown's comment in previous mail on this thread).
St.Ack

Re: lookup table

Posted by Rita <rm...@gmail.com>.
Yes, I am trying to save on disk space because of limited resouces and the
table will be around 30 billion rows.

The lookup table itself will be around 9k rows so its not too bad. A
character's range will be from 1 to 4.

I suppose I really should worry about it too much.





On Sun, Sep 16, 2012 at 6:16 PM, Stack <st...@duboce.net> wrote:

> On Sat, Sep 15, 2012 at 8:09 AM, Rita <rm...@gmail.com> wrote:
> > I am debating if a lookup table would help my situation.
> >
> > I have a bunch of codes which map with timestamp (unsigned int). The
> codes
> > look like this
> >
> > AA4
> > AAA5
> > A21
> > A4
> > ...
> > Z435
> >
> > The size range from 1 character to 4 characters (1 to 4 bytes,
> > respectively).
> >
> >
> > Would adding a lookup table for all my codes help in reducing space? If
> so,
> > what would be the best way to hash something like this?
> >
>
> You are trying to save on disk space?  You could make your keys binary
> four bytes max null prefixed if < 4 characters?  Why are you trying to
> save disk space?  You want a lookup table so you can have a code that
> is smaller than that of the 1-4 character codes?
>
> St.Ack
> St.Ack
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: lookup table

Posted by Stack <st...@duboce.net>.
On Sat, Sep 15, 2012 at 8:09 AM, Rita <rm...@gmail.com> wrote:
> I am debating if a lookup table would help my situation.
>
> I have a bunch of codes which map with timestamp (unsigned int). The codes
> look like this
>
> AA4
> AAA5
> A21
> A4
> ...
> Z435
>
> The size range from 1 character to 4 characters (1 to 4 bytes,
> respectively).
>
>
> Would adding a lookup table for all my codes help in reducing space? If so,
> what would be the best way to hash something like this?
>

You are trying to save on disk space?  You could make your keys binary
four bytes max null prefixed if < 4 characters?  Why are you trying to
save disk space?  You want a lookup table so you can have a code that
is smaller than that of the 1-4 character codes?

St.Ack
St.Ack