You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Oleg Ruchovets <or...@gmail.com> on 2011/01/05 17:29:13 UTC

bulk upload for multy column families

Hi
      I've read https://issues.apache.org/jira/browse/HBASE-1861 and read
mail list's post regarding this issue.

My questions are:
1) Am I understand correct that this features will be only supported
in 0.92.0. If yes when did you plan to release this version. Is it
complicated    to make changes supporting in order to support bulk loading
to multiple column in  latest already released version?
Simply  we are going to use it production and the most time consuming job is
hbase insertion.
Actually the time was  6 hours  to write ~5Gb ,  after some tuning
(using compression  , writing blocks/buffers ) it takes ~2.5 hour but it is
still a lot for us.

2) Currently we are using hbase 0.20.3. And we want to upgrade. What version
we should use?

3) I've read as much as I found about bulk loading but didn't find any
simple tutorial.

Thanks in advance
Oleg.

Re: bulk upload for multy column families

Posted by Stack <st...@duboce.net>.
On Wed, Jan 5, 2011 at 8:29 AM, Oleg Ruchovets <or...@gmail.com> wrote:
> Hi
>      I've read https://issues.apache.org/jira/browse/HBASE-1861 and read
> mail list's post regarding this issue.
>
> My questions are:
> 1) Am I understand correct that this features will be only supported
> in 0.92.0.

Yes.


> If yes when did you plan to release this version.

Soon after 0.90. (0.90 is about to put up its 4th release candidate).


 Is it
> complicated    to make changes supporting in order to support bulk loading
> to multiple column in  latest already released version?


Try it.  IIRC, the patch was not too intrusive.  Download the last
patch posted here: https://review.cloudera.org//r/1272/#review2044


> Simply  we are going to use it production and the most time consuming job is
> hbase insertion.
> Actually the time was  6 hours  to write ~5Gb ,  after some tuning
> (using compression  , writing blocks/buffers ) it takes ~2.5 hour but it is
> still a lot for us.
>

Thats a long time.  How many row inserts?  Where you think the time is
being spent?


> 2) Currently we are using hbase 0.20.3. And we want to upgrade. What version
> we should use?
>

Oh.  Ignore my advice above that suggests you try the patch against
your current hbase.  The patch won't apply to 0.20.x.  It would likely
work against 0.90.x.

You should at least update to 0.20.6.

0.90.0 should be out soon.  You might want to wait on that.


> 3) I've read as much as I found about bulk loading but didn't find any
> simple tutorial.
>

You've seen http://people.apache.org/~stack/hbase-0.90.0-candidate-2/docs/bulk-loads.html
(Bulk Upload was rewritten for 0.90.x).

St.Ack