You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ioakim Perros <im...@gmail.com> on 2012/07/24 21:40:27 UTC

Presplitting regions + Bulk import data into table

Hi,

I am bulk importing data through code and presplitting regions of a 
table - though I see all data to lead to the first server.

The byte objects to compare with ( so to decide for each reducer' s 
output to which region it should go to ) are of the form : 
Bytes.toBytes(String.valueOf(#somenumber))

and the reducer's output key is an ImmutableBytesWritable - its' bytes 
are being formed like this :
byte[] ckBytes = Bytes.toBytes(String.valueOf(#reducer_task_id));

The thing is that the reducer (KeyValueSortReducer) class allows only 
ImmutableBytesWritable objects to be the key of each table's record.

Does anyone have an idea on how this comparison (between 
ImmutableBytesWritable and Bytes)is done and what should I do in order 
to make the comparison work?

Thanks in advance!
IP

Re: Presplitting regions + Bulk import data into table

Posted by Ioakim Perros <im...@gmail.com>.
Excuse me if I mis-expressed the problem, but this (what you propose) is 
what I do.

The problem is that although the output of my job has as key an 
ImmutableBytesWritable object,

the function that is being used in order to define the split points of 
the table, is the following:

public void*createTable*(HTableDescriptor  <http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html>  desc,
                         byte[] startKey,
                         byte[] endKey,
                         int numRegions)


of HBaseAdmin class, and the startKey and endKey have to be defined as 
byte[].

I believe this contradiction between ImmutableBytesWritable and byte[] 
objects is what causes the faulty comparison effect,
correct me if you believe I 'm wrong.

Thank you very much for your response,
IP


On 07/25/2012 04:55 AM, Bryan Beaudreault wrote:
> Change the output of your job (or whatever you are using to seed this
> reducer -- mapper, whatever), to output ImmutableBytesWritable as the key.
>   Then wrap your bytes in the writable.  Basically, Bytes.toBytes() only
> returns a raw byte[] object.  You need an object that implements
> WritableComparable, and ImmutableBytesWritable is what you should use.  Use
> it like this:
>
> ImmutableBytesWritable outKey = new
> ImmutableBytesWritable(Bytes.toBytes(String.valueOf(#somenumber)));
>
> or use it's setter:
>
> ImmutableBytesWritable outKey = new ImmutableBytesWritable();
> outKey.set(Bytes.toBytes(String.valueOf(#somenumber)));
>
> On Tue, Jul 24, 2012 at 3:40 PM, Ioakim Perros <im...@gmail.com> wrote:
>
>> Hi,
>>
>> I am bulk importing data through code and presplitting regions of a table
>> - though I see all data to lead to the first server.
>>
>> The byte objects to compare with ( so to decide for each reducer' s output
>> to which region it should go to ) are of the form :
>> Bytes.toBytes(String.valueOf(#**somenumber))
>>
>> and the reducer's output key is an ImmutableBytesWritable - its' bytes are
>> being formed like this :
>> byte[] ckBytes = Bytes.toBytes(String.valueOf(#**reducer_task_id));
>>
>> The thing is that the reducer (KeyValueSortReducer) class allows only
>> ImmutableBytesWritable objects to be the key of each table's record.
>>
>> Does anyone have an idea on how this comparison (between
>> ImmutableBytesWritable and Bytes)is done and what should I do in order to
>> make the comparison work?
>>
>> Thanks in advance!
>> IP
>>


Re: Presplitting regions + Bulk import data into table

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Change the output of your job (or whatever you are using to seed this
reducer -- mapper, whatever), to output ImmutableBytesWritable as the key.
 Then wrap your bytes in the writable.  Basically, Bytes.toBytes() only
returns a raw byte[] object.  You need an object that implements
WritableComparable, and ImmutableBytesWritable is what you should use.  Use
it like this:

ImmutableBytesWritable outKey = new
ImmutableBytesWritable(Bytes.toBytes(String.valueOf(#somenumber)));

or use it's setter:

ImmutableBytesWritable outKey = new ImmutableBytesWritable();
outKey.set(Bytes.toBytes(String.valueOf(#somenumber)));

On Tue, Jul 24, 2012 at 3:40 PM, Ioakim Perros <im...@gmail.com> wrote:

> Hi,
>
> I am bulk importing data through code and presplitting regions of a table
> - though I see all data to lead to the first server.
>
> The byte objects to compare with ( so to decide for each reducer' s output
> to which region it should go to ) are of the form :
> Bytes.toBytes(String.valueOf(#**somenumber))
>
> and the reducer's output key is an ImmutableBytesWritable - its' bytes are
> being formed like this :
> byte[] ckBytes = Bytes.toBytes(String.valueOf(#**reducer_task_id));
>
> The thing is that the reducer (KeyValueSortReducer) class allows only
> ImmutableBytesWritable objects to be the key of each table's record.
>
> Does anyone have an idea on how this comparison (between
> ImmutableBytesWritable and Bytes)is done and what should I do in order to
> make the comparison work?
>
> Thanks in advance!
> IP
>