You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by h <hb...@patientcentral.com> on 2011/03/22 18:59:38 UTC

We're seeing problems loading data into HBase using MR

Hey everyone,

I've got a situation where my data loads to HBase are failing.

The data is sent an isolated HBase cluster from a different hadoop cluster.  What I see is that the performance is pretty bad (around 40k burst, 1k average inserts - with about 200 byte payloads).  If I were to write a standalone java client to hit the cluster I can get a sustained 40k ops/sec insert. 80k ops/second if I run in a different window.

The network is all gigE. 4GB heap on region server.Nothing external of the HBase system running on the cluster.

>From the MR side we see that the job eventually gets to 50% and then fails with no status updates in 600 seconds.  If we were to write a simple java MR that shoves in about 10Gb data through 20 reducers it also chokes and dies.

Is there anything that we should be looking at?  As a point of reference at 0.26 we could push 250k ops / sec same jobs averaging in the 150's.  We also applied the META MEMSTORE_FLUSHSIZE fix (http://hbase.apache.org/book/upgrading.html)


Any help is greatly appreciated!

Thanks,
Dirk 

Re: We're seeing problems loading data into HBase using MR

Posted by Stack <st...@duboce.net>.
What do the regionserver logs say Dirk?  If you jstack one of them a
few times, do you see anything -- are they hanging up on any call?
Post your configs to pastebin and we'll take a look.

St.Ack

On Tue, Mar 22, 2011 at 10:59 AM, h <hb...@patientcentral.com> wrote:
> Hey everyone,
>
> I've got a situation where my data loads to HBase are failing.
>
> The data is sent an isolated HBase cluster from a different hadoop cluster.  What I see is that the performance is pretty bad (around 40k burst, 1k average inserts - with about 200 byte payloads).  If I were to write a standalone java client to hit the cluster I can get a sustained 40k ops/sec insert. 80k ops/second if I run in a different window.
>
> The network is all gigE. 4GB heap on region server.Nothing external of the HBase system running on the cluster.
>
> From the MR side we see that the job eventually gets to 50% and then fails with no status updates in 600 seconds.  If we were to write a simple java MR that shoves in about 10Gb data through 20 reducers it also chokes and dies.
>
> Is there anything that we should be looking at?  As a point of reference at 0.26 we could push 250k ops / sec same jobs averaging in the 150's.  We also applied the META MEMSTORE_FLUSHSIZE fix (http://hbase.apache.org/book/upgrading.html)
>
>
> Any help is greatly appreciated!
>
> Thanks,
> Dirk
>