You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Sunil <su...@bizosys.com> on 2011/12/07 04:34:41 UTC

Backup HBase to S3

We have evaluated various options to backup data inside HBase and built a
solution to backup HBase to S3. This post will explain the options and also
provide the code for anyone to download and implement it for their own HBase
installations.
http://blog.bizosys.com/2011/12/hbase-backup-to-amazon-s3.html

 

Thanks

Sunil


Re: Meta region hotspotting

Posted by Andrew Purtell <ap...@apache.org>.
> I do not know if this was fixed in 0.90.X but, its possible that there is a bug with REST that causes full scan of META for any misses. 


It's possible but REST relies on HTable / HTablePool and tries to do the right thing with the client API, to minimize REST specific problems interacting with HBase, HBASE-5228 not withstanding, a mistake for which I apologize.


Best regards


       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Ben West <bw...@yahoo.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Cc: 
> Sent: Thursday, January 19, 2012 9:46 AM
> Subject: Re: Meta region hotspotting
> 
>T hanks Jack.
> 
> We couldn't find any "bad" queries (though perhaps we weren't 
> looking hard enough) but an upgrade to 0.92rc3 seems to have resolved the 
> problem. Not sure why, but there were a couple modifications to HTablePool that 
> might've done it.
> 
> Thanks for your help everyone,
> -Ben
> 
> 
> ----- Original Message -----
> From: Jack Levin <ma...@gmail.com>
> To: user@hbase.apache.org
> Cc: Ben West <bw...@yahoo.com>
> Sent: Monday, December 26, 2011 7:32 PM
> Subject: Re: Meta region hotspotting
> 
> Some time ago, we had a situation where our REST server was slammed
> with queries that did not find any matches for rows in Hbase.  When
> that happened we sustained 50k rpc/sec to META region server as
> reported by the master web page.  After digging deeper we found that
> reach request with 'wrong' url, caused full scan of META (more than
> 30k records for us),  This was in version 0.89.   I do not know if
> this was fixed in 0.90.X but, its possible that there is a bug with
> REST that causes full scan of META for any misses.   Just something to
> check.  (as soon as we removed the erroneous queries to REST this
> issue was resolved).
> 
> -Jack
> 
> On Fri, Dec 23, 2011 at 5:42 PM, Stack <st...@duboce.net> wrote:
>>  On Fri, Dec 23, 2011 at 1:42 PM, Ben West <bw...@yahoo.com> 
> wrote:
>>>  For those following at home, I tried this change and it does not appear 
> to have solved the problem. Back to the drawing board.
>>> 
>>> 
>> 
>>  Thanks for digging in Ben.
>>  St.Ack
> 

Re: Meta region hotspotting

Posted by Ben West <bw...@yahoo.com>.
Thanks Jack.

We couldn't find any "bad" queries (though perhaps we weren't looking hard enough) but an upgrade to 0.92rc3 seems to have resolved the problem. Not sure why, but there were a couple modifications to HTablePool that might've done it.

Thanks for your help everyone,
-Ben


----- Original Message -----
From: Jack Levin <ma...@gmail.com>
To: user@hbase.apache.org
Cc: Ben West <bw...@yahoo.com>
Sent: Monday, December 26, 2011 7:32 PM
Subject: Re: Meta region hotspotting

Some time ago, we had a situation where our REST server was slammed
with queries that did not find any matches for rows in Hbase.  When
that happened we sustained 50k rpc/sec to META region server as
reported by the master web page.  After digging deeper we found that
reach request with 'wrong' url, caused full scan of META (more than
30k records for us),  This was in version 0.89.   I do not know if
this was fixed in 0.90.X but, its possible that there is a bug with
REST that causes full scan of META for any misses.   Just something to
check.  (as soon as we removed the erroneous queries to REST this
issue was resolved).

-Jack

On Fri, Dec 23, 2011 at 5:42 PM, Stack <st...@duboce.net> wrote:
> On Fri, Dec 23, 2011 at 1:42 PM, Ben West <bw...@yahoo.com> wrote:
>> For those following at home, I tried this change and it does not appear to have solved the problem. Back to the drawing board.
>>
>>
>
> Thanks for digging in Ben.
> St.Ack


Re: Meta region hotspotting

Posted by Jack Levin <ma...@gmail.com>.
Some time ago, we had a situation where our REST server was slammed
with queries that did not find any matches for rows in Hbase.  When
that happened we sustained 50k rpc/sec to META region server as
reported by the master web page.  After digging deeper we found that
reach request with 'wrong' url, caused full scan of META (more than
30k records for us),  This was in version 0.89.   I do not know if
this was fixed in 0.90.X but, its possible that there is a bug with
REST that causes full scan of META for any misses.   Just something to
check.  (as soon as we removed the erroneous queries to REST this
issue was resolved).

-Jack

On Fri, Dec 23, 2011 at 5:42 PM, Stack <st...@duboce.net> wrote:
> On Fri, Dec 23, 2011 at 1:42 PM, Ben West <bw...@yahoo.com> wrote:
>> For those following at home, I tried this change and it does not appear to have solved the problem. Back to the drawing board.
>>
>>
>
> Thanks for digging in Ben.
> St.Ack

Re: Meta region hotspotting

Posted by Stack <st...@duboce.net>.
On Fri, Dec 23, 2011 at 1:42 PM, Ben West <bw...@yahoo.com> wrote:
> For those following at home, I tried this change and it does not appear to have solved the problem. Back to the drawing board.
>
>

Thanks for digging in Ben.
St.Ack

Re: Meta region hotspotting

Posted by Ben West <bw...@yahoo.com>.
For those following at home, I tried this change and it does not appear to have solved the problem. Back to the drawing board.


----- Original Message -----
From: Stack <sa...@gmail.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Thursday, December 15, 2011 7:08 PM
Subject: Re: Meta region hotspotting

Can u try w/o a limit or with upped limit and see if a diff?    Sounds plausible yes but looks like u could make a fact with some small experiments

Thanks



On Dec 15, 2011, at 12:25 PM, Ben West <bw...@yahoo.com> wrote:

> Digging into this further, I see the following in HTablePool:
> 
>   public void putTable(HTableInterface table) {
>     LinkedList<HTableInterface> queue = tables.get(Bytes.toString(table.getTableName()));
>     synchronized(queue) {
>       if(queue.size() >= maxSize) {
>         // release table instance since we're not reusing it
>         this.tableFactory.releaseHTableInterface(table);
>         return;
>       }
>       queue.add(table);
>     }
>   }
> 
> maxSize is hard coded to 10, so I think if there are more than 10 simultaneous requests, future requests will get brand-new tables, lacking any cache. 
> 
> Does this sound plausible? I can submit a JIRA asking for maxSize to be a config param if so. We have >> 10 simultaneous requests.
> 
> 
> ----- Original Message -----
> From: Ben West <bw...@yahoo.com>
> To: Stack <st...@duboce.net>; "user@hbase.apache.org" <us...@hbase.apache.org>
> Cc: 
> Sent: Friday, December 9, 2011 10:52 AM
> Subject: Re: Meta region hotspotting
> 
> Thanks Stack.
> 
> Is there any way I can log requests going to .META. vs. region servers? I've tried setting various portions of HBase to DEBUG but I haven't found the magic combo yet.
> 
> 
> ----- Original Message -----
> From: Stack <st...@duboce.net>
> To: user@hbase.apache.org; Ben West <bw...@yahoo.com>
> Cc: 
> Sent: Thursday, December 8, 2011 6:24 PM
> Subject: Re: Meta region hotspotting
> 
> On Thu, Dec 8, 2011 at 4:05 PM, Ben West <bw...@yahoo.com> wrote:
>> We have a cluster with four region servers and about 2,000 regions. We're using the REST server, and we've noticed that whatever region is hosting META gets 3-5x the number of requests that the other regions do.
>> 
>> It's my understanding that the client should cache the row start/end locations, and hence should need to go to META only rarely after a little bit. But I'm sustaining tens of thousands of requests per second for many minutes and they all still seem to hit META - does anyone know why this would be?
>> 
> 
> Your understanding is corrrect.  I'd think that the caching would
> eventually pick up all 2k region locations  unless REST is making a
> new client on every invocation, or every few invocations (I've not
> checked).
> 
> St.Ack


Re: Meta region hotspotting

Posted by Stack <sa...@gmail.com>.
Can u try w/o a limit or with upped limit and see if a diff?    Sounds plausible yes but looks like u could make a fact with some small experiments

Thanks



On Dec 15, 2011, at 12:25 PM, Ben West <bw...@yahoo.com> wrote:

> Digging into this further, I see the following in HTablePool:
> 
>   public void putTable(HTableInterface table) {
>     LinkedList<HTableInterface> queue = tables.get(Bytes.toString(table.getTableName()));
>     synchronized(queue) {
>       if(queue.size() >= maxSize) {
>         // release table instance since we're not reusing it
>         this.tableFactory.releaseHTableInterface(table);
>         return;
>       }
>       queue.add(table);
>     }
>   }
> 
> maxSize is hard coded to 10, so I think if there are more than 10 simultaneous requests, future requests will get brand-new tables, lacking any cache. 
> 
> Does this sound plausible? I can submit a JIRA asking for maxSize to be a config param if so. We have >> 10 simultaneous requests.
> 
> 
> ----- Original Message -----
> From: Ben West <bw...@yahoo.com>
> To: Stack <st...@duboce.net>; "user@hbase.apache.org" <us...@hbase.apache.org>
> Cc: 
> Sent: Friday, December 9, 2011 10:52 AM
> Subject: Re: Meta region hotspotting
> 
> Thanks Stack.
> 
> Is there any way I can log requests going to .META. vs. region servers? I've tried setting various portions of HBase to DEBUG but I haven't found the magic combo yet.
> 
> 
> ----- Original Message -----
> From: Stack <st...@duboce.net>
> To: user@hbase.apache.org; Ben West <bw...@yahoo.com>
> Cc: 
> Sent: Thursday, December 8, 2011 6:24 PM
> Subject: Re: Meta region hotspotting
> 
> On Thu, Dec 8, 2011 at 4:05 PM, Ben West <bw...@yahoo.com> wrote:
>> We have a cluster with four region servers and about 2,000 regions. We're using the REST server, and we've noticed that whatever region is hosting META gets 3-5x the number of requests that the other regions do.
>> 
>> It's my understanding that the client should cache the row start/end locations, and hence should need to go to META only rarely after a little bit. But I'm sustaining tens of thousands of requests per second for many minutes and they all still seem to hit META - does anyone know why this would be?
>> 
> 
> Your understanding is corrrect.  I'd think that the caching would
> eventually pick up all 2k region locations  unless REST is making a
> new client on every invocation, or every few invocations (I've not
> checked).
> 
> St.Ack

Re: Meta region hotspotting

Posted by Ben West <bw...@yahoo.com>.
Digging into this further, I see the following in HTablePool:

  public void putTable(HTableInterface table) {
    LinkedList<HTableInterface> queue = tables.get(Bytes.toString(table.getTableName()));
    synchronized(queue) {
      if(queue.size() >= maxSize) {
        // release table instance since we're not reusing it
        this.tableFactory.releaseHTableInterface(table);
        return;
      }
      queue.add(table);
    }
  }

maxSize is hard coded to 10, so I think if there are more than 10 simultaneous requests, future requests will get brand-new tables, lacking any cache. 

Does this sound plausible? I can submit a JIRA asking for maxSize to be a config param if so. We have >> 10 simultaneous requests.


----- Original Message -----
From: Ben West <bw...@yahoo.com>
To: Stack <st...@duboce.net>; "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: 
Sent: Friday, December 9, 2011 10:52 AM
Subject: Re: Meta region hotspotting

Thanks Stack.

Is there any way I can log requests going to .META. vs. region servers? I've tried setting various portions of HBase to DEBUG but I haven't found the magic combo yet.


----- Original Message -----
From: Stack <st...@duboce.net>
To: user@hbase.apache.org; Ben West <bw...@yahoo.com>
Cc: 
Sent: Thursday, December 8, 2011 6:24 PM
Subject: Re: Meta region hotspotting

On Thu, Dec 8, 2011 at 4:05 PM, Ben West <bw...@yahoo.com> wrote:
> We have a cluster with four region servers and about 2,000 regions. We're using the REST server, and we've noticed that whatever region is hosting META gets 3-5x the number of requests that the other regions do.
>
> It's my understanding that the client should cache the row start/end locations, and hence should need to go to META only rarely after a little bit. But I'm sustaining tens of thousands of requests per second for many minutes and they all still seem to hit META - does anyone know why this would be?
>

Your understanding is corrrect.  I'd think that the caching would
eventually pick up all 2k region locations  unless REST is making a
new client on every invocation, or every few invocations (I've not
checked).

St.Ack

Re: Meta region hotspotting

Posted by Ben West <bw...@yahoo.com>.
Thanks Stack.

Is there any way I can log requests going to .META. vs. region servers? I've tried setting various portions of HBase to DEBUG but I haven't found the magic combo yet.


----- Original Message -----
From: Stack <st...@duboce.net>
To: user@hbase.apache.org; Ben West <bw...@yahoo.com>
Cc: 
Sent: Thursday, December 8, 2011 6:24 PM
Subject: Re: Meta region hotspotting

On Thu, Dec 8, 2011 at 4:05 PM, Ben West <bw...@yahoo.com> wrote:
> We have a cluster with four region servers and about 2,000 regions. We're using the REST server, and we've noticed that whatever region is hosting META gets 3-5x the number of requests that the other regions do.
>
> It's my understanding that the client should cache the row start/end locations, and hence should need to go to META only rarely after a little bit. But I'm sustaining tens of thousands of requests per second for many minutes and they all still seem to hit META - does anyone know why this would be?
>

Your understanding is corrrect.  I'd think that the caching would
eventually pick up all 2k region locations  unless REST is making a
new client on every invocation, or every few invocations (I've not
checked).

St.Ack


Re: Meta region hotspotting

Posted by Stack <st...@duboce.net>.
On Thu, Dec 8, 2011 at 4:05 PM, Ben West <bw...@yahoo.com> wrote:
> We have a cluster with four region servers and about 2,000 regions. We're using the REST server, and we've noticed that whatever region is hosting META gets 3-5x the number of requests that the other regions do.
>
> It's my understanding that the client should cache the row start/end locations, and hence should need to go to META only rarely after a little bit. But I'm sustaining tens of thousands of requests per second for many minutes and they all still seem to hit META - does anyone know why this would be?
>

Your understanding is corrrect.  I'd think that the caching would
eventually pick up all 2k region locations  unless REST is making a
new client on every invocation, or every few invocations (I've not
checked).

St.Ack

Meta region hotspotting

Posted by Ben West <bw...@yahoo.com>.
Hey all,

We have a cluster with four region servers and about 2,000 regions. We're using the REST server, and we've noticed that whatever region is hosting META gets 3-5x the number of requests that the other regions do. 

It's my understanding that the client should cache the row start/end locations, and hence should need to go to META only rarely after a little bit. But I'm sustaining tens of thousands of requests per second for many minutes and they all still seem to hit META - does anyone know why this would be?

We're using 0.90.4.

Thanks!
-Ben


Re: Backup HBase to S3

Posted by lars hofhansl <lh...@yahoo.com>.
Just committed HBASE-4682.
Should be able to make this work now with deletes.



________________________________
 From: lars hofhansl <lh...@yahoo.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Wednesday, December 7, 2011 10:38 AM
Subject: Re: Backup HBase to S3
 
>From the blog: 

"However, in this method deletes are not captured which may lead to some unnecessary data in tables."

Would it help if I finally got work on HBASE-4682? (or maybe somebody wants to volunteer, all the pieces are there in trunk).


-- Lars
________________________________
From: Sunil <su...@bizosys.com>
To: dev@hbase.apache.org; user@hbase.apache.org 
Sent: Tuesday, December 6, 2011 7:34 PM
Subject: Backup HBase to S3

We have evaluated various options to backup data inside HBase and built a
solution to backup HBase to S3. This post will explain the options and also
provide the code for anyone to download and implement it for their own HBase
installations.
http://blog.bizosys.com/2011/12/hbase-backup-to-amazon-s3.html



Thanks

Sunil

Re: Backup HBase to S3

Posted by lars hofhansl <lh...@yahoo.com>.
Just committed HBASE-4682.
Should be able to make this work now with deletes.



________________________________
 From: lars hofhansl <lh...@yahoo.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Wednesday, December 7, 2011 10:38 AM
Subject: Re: Backup HBase to S3
 
>From the blog: 

"However, in this method deletes are not captured which may lead to some unnecessary data in tables."

Would it help if I finally got work on HBASE-4682? (or maybe somebody wants to volunteer, all the pieces are there in trunk).


-- Lars
________________________________
From: Sunil <su...@bizosys.com>
To: dev@hbase.apache.org; user@hbase.apache.org 
Sent: Tuesday, December 6, 2011 7:34 PM
Subject: Backup HBase to S3

We have evaluated various options to backup data inside HBase and built a
solution to backup HBase to S3. This post will explain the options and also
provide the code for anyone to download and implement it for their own HBase
installations.
http://blog.bizosys.com/2011/12/hbase-backup-to-amazon-s3.html



Thanks

Sunil

Re: Backup HBase to S3

Posted by lars hofhansl <lh...@yahoo.com>.
>From the blog: 

"However, in this method deletes are not captured which may lead to some unnecessary data in tables."

Would it help if I finally got work on HBASE-4682? (or maybe somebody wants to volunteer, all the pieces are there in trunk).


-- Lars
________________________________
From: Sunil <su...@bizosys.com>
To: dev@hbase.apache.org; user@hbase.apache.org 
Sent: Tuesday, December 6, 2011 7:34 PM
Subject: Backup HBase to S3

We have evaluated various options to backup data inside HBase and built a
solution to backup HBase to S3. This post will explain the options and also
provide the code for anyone to download and implement it for their own HBase
installations.
http://blog.bizosys.com/2011/12/hbase-backup-to-amazon-s3.html



Thanks

Sunil

Re: Backup HBase to S3

Posted by lars hofhansl <lh...@yahoo.com>.
>From the blog: 

"However, in this method deletes are not captured which may lead to some unnecessary data in tables."

Would it help if I finally got work on HBASE-4682? (or maybe somebody wants to volunteer, all the pieces are there in trunk).


-- Lars
________________________________
From: Sunil <su...@bizosys.com>
To: dev@hbase.apache.org; user@hbase.apache.org 
Sent: Tuesday, December 6, 2011 7:34 PM
Subject: Backup HBase to S3

We have evaluated various options to backup data inside HBase and built a
solution to backup HBase to S3. This post will explain the options and also
provide the code for anyone to download and implement it for their own HBase
installations.
http://blog.bizosys.com/2011/12/hbase-backup-to-amazon-s3.html



Thanks

Sunil