You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Anoop Sam John <an...@huawei.com> on 2012/05/11 19:18:19 UTC

Usage of block encoding in bulk loading

Hi Devs
              When the data is bulk loaded using HFileOutputFormat, we are not using the block encoding and the HBase handled checksum features I think..  When the writer is created for making the HFile, I am not seeing any such info passing to the WriterBuilder.
In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont have these info and do not pass also to the writer... So those HFiles will not have these optimizations..

Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide one HFile(created by the MR) iff it can not belong to just one region, I can see we pass the datablock encoding details and checksum details to the new HFile writer. But this step wont happen normally I think..

Correct me if my understanding is wrong pls...

Thanks
Anoop

RE: Usage of block encoding in bulk loading

Posted by Anoop Sam John <an...@huawei.com>.

Thanks Stack for your reply. I will work on this and give a patch soon...

-Anoop-
________________________________________
From: saint.ack@gmail.com [saint.ack@gmail.com] on behalf of Stack [stack@duboce.net]
Sent: Saturday, May 12, 2012 10:08 AM
To: dev@hbase.apache.org
Subject: Re: Usage of block encoding in bulk loading

On Fri, May 11, 2012 at 10:18 AM, Anoop Sam John <an...@huawei.com> wrote:
> Hi Devs
>              When the data is bulk loaded using HFileOutputFormat, we are not using the block encoding and the HBase handled checksum features I think..  When the writer is created for making the HFile, I am not seeing any such info passing to the WriterBuilder.
> In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont have these info and do not pass also to the writer... So those HFiles will not have these optimizations..
>
> Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide one HFile(created by the MR) iff it can not belong to just one region, I can see we pass the datablock encoding details and checksum details to the new HFile writer. But this step wont happen normally I think..
>
> Correct me if my understanding is wrong pls...
>

Sounds plausible Anoop.  Sounds like something worth fixing too?

Good on you,
St.Ack

Re: Usage of block encoding in bulk loading

Posted by Stack <st...@duboce.net>.

On Fri, May 11, 2012 at 10:18 AM, Anoop Sam John <an...@huawei.com> wrote:
> Hi Devs
>              When the data is bulk loaded using HFileOutputFormat, we are not using the block encoding and the HBase handled checksum features I think..  When the writer is created for making the HFile, I am not seeing any such info passing to the WriterBuilder.
> In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont have these info and do not pass also to the writer... So those HFiles will not have these optimizations..
>
> Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide one HFile(created by the MR) iff it can not belong to just one region, I can see we pass the datablock encoding details and checksum details to the new HFile writer. But this step wont happen normally I think..
>
> Correct me if my understanding is wrong pls...
>

Sounds plausible Anoop.  Sounds like something worth fixing too?

Good on you,
St.Ack