You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Eric <er...@gmail.com> on 2011/01/26 10:47:04 UTC

Data format in HBase

I'm wondering what the best way is to store my data in HBase. I'm currently
converting everything to a string and then to a bytes array.
What are others doing? Plain text to to byte arrays and eventually convert
your data back to floats, int, etcetera?

Re: Data format in HBase

Posted by devush <de...@gmail.com>.
Hi,
 I am not sure if it is a good practice to start an old thread?

I just posted the same question, but i already see the response here.

My question at this point is if i have byte array, then how i will see
my row keys? through program only. through shell it is not possible,
we see decimal better, but for program it does not matter.

I might need this in the beginning as we are still doing
coding/debugging. may be later on we will not touch the content
through shell.

How other people are managing the bytes in row keys?
thanks
devush

On 26 January 2011 10:45, Friso van Vollenhoven
<fv...@xebia.com> wrote:
> There are indeed a number of toBytes(...) overloads, like Ryan said. When you have a fixed record type like this, using Bytes.toBytes(...) is likely the simplest and most compact thing to do. Protobuf and Avro are nice if you have records with optional fields or want to mix different types of records in one table. Also if you have records that may change over time (adding fields, etc.).
>
> Friso
>
>
> On 26 jan 2011, at 11:36, Eric wrote:
>
>> I've been looking at Avro and Protocol Buffers too. I'm storing multiple
>> properties, like a Tweet that has a user id, timestamp, message, etc. I
>> actually thought toBytes() would convert to string and then to bytes (stupid
>> assumtion). I think I´ll convert my Strings to the proper format (int's,
>> longs) and then use toBytes() because protobufs and Avro add too much
>> complexity in this case.
>>
>> 2011/1/26 Friso van Vollenhoven <fv...@xebia.com>
>>
>>> We are using protobuf (http://code.google.com/apis/protocolbuffers/).
>>> That's not by any means a recommendation, just a possibility. What is your
>>> use case?
>>>
>>> Friso
>>>
>>>
>>>
>>> On 26 jan 2011, at 10:47, Eric wrote:
>>>
>>> I'm wondering what the best way is to store my data in HBase. I'm currently
>>> converting everything to a string and then to a bytes array.
>>> What are others doing? Plain text to to byte arrays and eventually convert
>>> your data back to floats, int, etcetera?
>>>
>>>
>
>

Re: Data format in HBase

Posted by Friso van Vollenhoven <fv...@xebia.com>.
There are indeed a number of toBytes(...) overloads, like Ryan said. When you have a fixed record type like this, using Bytes.toBytes(...) is likely the simplest and most compact thing to do. Protobuf and Avro are nice if you have records with optional fields or want to mix different types of records in one table. Also if you have records that may change over time (adding fields, etc.).

Friso


On 26 jan 2011, at 11:36, Eric wrote:

> I've been looking at Avro and Protocol Buffers too. I'm storing multiple
> properties, like a Tweet that has a user id, timestamp, message, etc. I
> actually thought toBytes() would convert to string and then to bytes (stupid
> assumtion). I think I´ll convert my Strings to the proper format (int's,
> longs) and then use toBytes() because protobufs and Avro add too much
> complexity in this case.
> 
> 2011/1/26 Friso van Vollenhoven <fv...@xebia.com>
> 
>> We are using protobuf (http://code.google.com/apis/protocolbuffers/).
>> That's not by any means a recommendation, just a possibility. What is your
>> use case?
>> 
>> Friso
>> 
>> 
>> 
>> On 26 jan 2011, at 10:47, Eric wrote:
>> 
>> I'm wondering what the best way is to store my data in HBase. I'm currently
>> converting everything to a string and then to a bytes array.
>> What are others doing? Plain text to to byte arrays and eventually convert
>> your data back to floats, int, etcetera?
>> 
>> 


Re: Data format in HBase

Posted by Eric <er...@gmail.com>.
I've been looking at Avro and Protocol Buffers too. I'm storing multiple
properties, like a Tweet that has a user id, timestamp, message, etc. I
actually thought toBytes() would convert to string and then to bytes (stupid
assumtion). I think I´ll convert my Strings to the proper format (int's,
longs) and then use toBytes() because protobufs and Avro add too much
complexity in this case.

2011/1/26 Friso van Vollenhoven <fv...@xebia.com>

> We are using protobuf (http://code.google.com/apis/protocolbuffers/).
> That's not by any means a recommendation, just a possibility. What is your
> use case?
>
> Friso
>
>
>
> On 26 jan 2011, at 10:47, Eric wrote:
>
> I'm wondering what the best way is to store my data in HBase. I'm currently
> converting everything to a string and then to a bytes array.
> What are others doing? Plain text to to byte arrays and eventually convert
> your data back to floats, int, etcetera?
>
>

Re: Data format in HBase

Posted by Friso van Vollenhoven <fv...@xebia.com>.
We are using protobuf (http://code.google.com/apis/protocolbuffers/). That's not by any means a recommendation, just a possibility. What is your use case?

Friso



On 26 jan 2011, at 10:47, Eric wrote:

I'm wondering what the best way is to store my data in HBase. I'm currently
converting everything to a string and then to a bytes array.
What are others doing? Plain text to to byte arrays and eventually convert
your data back to floats, int, etcetera?


Re: Data format in HBase

Posted by Ryan Rawson <ry...@gmail.com>.
Check out the bytes utility class had methods for converting songs to byte
arrays and vice versa. If you have numeric data, you can save on space by
using the Bytes.toBytes(int) etc calls, instead of converting to string
first. This can make it a bit harder top display in the shell since it will
look like a hex dump not a number, but consider a long can have 19 decimal
digits, but only takes up 8 bytes, you can see the attraction.

At su we use binary storage, but it does make hive interop a little harder.
But the savings can be substantial!
On Jan 26, 2011 1:47 AM, "Eric" <er...@gmail.com> wrote:
> I'm wondering what the best way is to store my data in HBase. I'm
currently
> converting everything to a string and then to a bytes array.
> What are others doing? Plain text to to byte arrays and eventually convert
> your data back to floats, int, etcetera?