You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jack Levin <ma...@gmail.com> on 2010/11/23 18:31:25 UTC

question about meta data query intensity

Hello, I am curious if there is a potential bottleneck in .META.
ownership by a single region server.  Is it possible (safe) to split
meta region into several?

-Jack

Re: REST compression support (was Re: question about meta data query intensity)

Posted by Jack Levin <ma...@gmail.com>.
Thats great! Thanks. this will help us reduce network context
switching as we remove the need to pass a lot of uncompressed packets.

-Jack

On Wed, Nov 24, 2010 at 10:15 AM, Andrew Purtell <ap...@apache.org> wrote:
> Regards compressing the HTTP transactions between the REST server and REST client we punted on this back when Stargate had a WAR target so we could push that off to the servlet container configuration. Thanks for the question, which reminded me... I have just committed HBASE-3275, which is a trivial patch to support Accept-Encoding: gzip,deflate
>
> Index: src/main/java/org/apache/hadoop/hbase/rest/Main.java
> ===================================================================
> --- src/main/java/org/apache/hadoop/hbase/rest/Main.java        (revision 1038732)
> +++ src/main/java/org/apache/hadoop/hbase/rest/Main.java        (working copy)
> @@ -37,6 +37,7 @@
>  import org.mortbay.jetty.Server;
>  import org.mortbay.jetty.servlet.Context;
>  import org.mortbay.jetty.servlet.ServletHolder;
> +import org.mortbay.servlet.GzipFilter;
>
>  import com.sun.jersey.spi.container.servlet.ServletContainer;
>
> @@ -132,6 +133,7 @@
>       // set up context
>     Context context = new Context(server, "/", Context.SESSIONS);
>     context.addServlet(sh, "/*");
> +    context.addFilter(GzipFilter.class, "/*", 0);
>
>     server.start();
>     server.join();
>
> Regards interactions between HBase client and server, there is no option available for compressing Hadoop RPC.
>
>  - Andy
>
>
> --- On Wed, 11/24/10, Jack Levin <ma...@gmail.com> wrote:
>
>> From: Jack Levin <ma...@gmail.com>
>> Subject: Re: question about meta data query intensity
>> To: user@hbase.apache.org, apurtell@apache.org
>> Date: Wednesday, November 24, 2010, 9:25 AM
>>
>> Yes, but that does not alleviate CPU contention should there be too
>> many queries to a single region server.   On a separate topic, is
>> 'compression' in the works for REST gateway?   Similar to
>> mysql_client_compression?  We plan to drop in 500K or
>> more queries at a time into the REST, and it would be interesting
>> to see the performance gain against uncompressed data.
>>
>> Thanks.
>>
>> -Jack
>
>
>
>
>

Re: REST compression support (was Re: question about meta data query intensity)

Posted by Jack Levin <ma...@gmail.com>.
Btw, does it mean, I can send in a compressed query?  Or only receive
compressed data from REST or both?

-Jack

On Wed, Nov 24, 2010 at 10:15 AM, Andrew Purtell <ap...@apache.org> wrote:
> Regards compressing the HTTP transactions between the REST server and REST client we punted on this back when Stargate had a WAR target so we could push that off to the servlet container configuration. Thanks for the question, which reminded me... I have just committed HBASE-3275, which is a trivial patch to support Accept-Encoding: gzip,deflate
>
> Index: src/main/java/org/apache/hadoop/hbase/rest/Main.java
> ===================================================================
> --- src/main/java/org/apache/hadoop/hbase/rest/Main.java        (revision 1038732)
> +++ src/main/java/org/apache/hadoop/hbase/rest/Main.java        (working copy)
> @@ -37,6 +37,7 @@
>  import org.mortbay.jetty.Server;
>  import org.mortbay.jetty.servlet.Context;
>  import org.mortbay.jetty.servlet.ServletHolder;
> +import org.mortbay.servlet.GzipFilter;
>
>  import com.sun.jersey.spi.container.servlet.ServletContainer;
>
> @@ -132,6 +133,7 @@
>       // set up context
>     Context context = new Context(server, "/", Context.SESSIONS);
>     context.addServlet(sh, "/*");
> +    context.addFilter(GzipFilter.class, "/*", 0);
>
>     server.start();
>     server.join();
>
> Regards interactions between HBase client and server, there is no option available for compressing Hadoop RPC.
>
>  - Andy
>
>
> --- On Wed, 11/24/10, Jack Levin <ma...@gmail.com> wrote:
>
>> From: Jack Levin <ma...@gmail.com>
>> Subject: Re: question about meta data query intensity
>> To: user@hbase.apache.org, apurtell@apache.org
>> Date: Wednesday, November 24, 2010, 9:25 AM
>>
>> Yes, but that does not alleviate CPU contention should there be too
>> many queries to a single region server.   On a separate topic, is
>> 'compression' in the works for REST gateway?   Similar to
>> mysql_client_compression?  We plan to drop in 500K or
>> more queries at a time into the REST, and it would be interesting
>> to see the performance gain against uncompressed data.
>>
>> Thanks.
>>
>> -Jack
>
>
>
>
>

Re: question about meta data query intensity

Posted by Stack <st...@duboce.net>.
I will get you a jar next week.  I'd like to test it my side before
passing it to you.  Bug me if I forget.
St.Ack

On Wed, Nov 24, 2010 at 11:53 AM, Jack Levin <ma...@gmail.com> wrote:
> Yes, I am game.
>
> -Jack
>
> On Wed, Nov 24, 2010 at 11:02 AM, Stack <st...@duboce.net> wrote:
>> If you are game for deploying an instrumented jar, we could log client
>> lookups in .META. and try and figure if it profligate.
>> St.Ack
>>
>> On Wed, Nov 24, 2010 at 9:25 AM, Jack Levin <ma...@gmail.com> wrote:
>>> Yes, but that does not alleviate CPU contention should there be too
>>> many queries to a single region server.   On a separate topic, is
>>> 'compression' in the works for REST gateway?   Similar to
>>> mysql_client_compression?  We plan to drop in 500K or more queries at
>>> a time into the REST, and it would be interesting to see the
>>> performance gain against uncompressed data.
>>>
>>> Thanks.
>>>
>>> -Jack
>>>
>>> On Wed, Nov 24, 2010 at 9:04 AM, Andrew Purtell <ap...@apache.org> wrote:
>>>> The REST gateway (Stargate) is a long lived client. :-)
>>>>
>>>> It uses HTablePool internally so this will keep some warm table references around in addition to the region location caching that HConnectionManager does behind the scenes. (10 references, but this could be made configurable.)
>>>>
>>>> Best regards,
>>>>
>>>>    - Andy
>>>>
>>>> --- On Tue, 11/23/10, Jack Levin <ma...@gmail.com> wrote:
>>>>
>>>>> From: Jack Levin <ma...@gmail.com>
>>>>> Subject: Re: question about meta data query intensity
>>>>> To: user@hbase.apache.org
>>>>> Date: Tuesday, November 23, 2010, 11:06 AM
>>>>> its REST, and generally no long lived
>>>>> clients, yes, caching of regions
>>>>> helps however, we expect long tail hits that will be
>>>>> uncached, which
>>>>> may stress out meta region, that being said, is it possible
>>>>> create
>>>>> affinity and nail meta region into a beefy server or set of
>>>>> beefy
>>>>> servers?
>>>>>
>>>>> -Jack
>>>>>
>>>>> On Tue, Nov 23, 2010 at 10:58 AM, Jonathan Gray <jg...@fb.com>
>>>>> wrote:
>>>>> > Are you going to have long-lived clients?  How are
>>>>> you accessing HBase?  REST or Thrift gateways?  Caching of
>>>>> region locations should help significantly so that it's only
>>>>> a bottleneck right at the startup of the
>>>>> cluster/gateways/clients.
>>>>> >
>>>>> >> -----Original Message-----
>>>>> >> From: Jack Levin [mailto:magnito@gmail.com]
>>>>> >> Sent: Tuesday, November 23, 2010 10:53 AM
>>>>> >> To: user@hbase.apache.org
>>>>> >> Subject: Re: question about meta data query
>>>>> intensity
>>>>> >>
>>>>> >> my concern is that we plane to have 120
>>>>> regionservers with 1000
>>>>> >> Regions each, so the hits to meta could be quite
>>>>> intense.  (why so
>>>>> >> many regions? we are storing 1 Petabyte of data of
>>>>> images into hbase).
>>>>> >>
>>>>> >> -Jack
>>>>> >>
>>>>> >> On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray
>>>>> <jg...@fb.com>
>>>>> wrote:
>>>>> >> > It is possible that it could be a bottleneck
>>>>> but usually is not.
>>>>> >>  Generally production HBase installations have
>>>>> long-lived clients, so
>>>>> >> the client-side caching is sufficient to reduce
>>>>> the amount of load to
>>>>> >> META (virtually 0 clean cluster is at steady-state
>>>>> / no region
>>>>> >> movement).
>>>>> >> >
>>>>> >> > For MapReduce, you do make new clients but
>>>>> generally only need to
>>>>> >> query for one region per task.
>>>>> >> >
>>>>> >> > It is not currently possible to split META.
>>>>>  We hard-coded some stuff
>>>>> >> a while back to make things easier and in the name
>>>>> of correctness.
>>>>> >> >
>>>>> >> > HBASE-3171 is about removing the ROOT region
>>>>> and putting the META
>>>>> >> region(s) locations into ZK directly.  When we
>>>>> make that change, we
>>>>> >> could probably also re-enable the splitting of
>>>>> META.
>>>>> >> >
>>>>> >> > JG
>>>>> >> >
>>>>> >> >> -----Original Message-----
>>>>> >> >> From: Jack Levin [mailto:magnito@gmail.com]
>>>>> >> >> Sent: Tuesday, November 23, 2010 9:31 AM
>>>>> >> >> To: user@hbase.apache.org
>>>>> >> >> Subject: question about meta data query
>>>>> intensity
>>>>> >> >>
>>>>> >> >> Hello, I am curious if there is a
>>>>> potential bottleneck in .META.
>>>>> >> >> ownership by a single region server.  Is
>>>>> it possible (safe) to split
>>>>> >> >> meta region into several?
>>>>> >> >>
>>>>> >> >> -Jack
>>>>> >> >
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: question about meta data query intensity

Posted by Jack Levin <ma...@gmail.com>.
Yes, I am game.

-Jack

On Wed, Nov 24, 2010 at 11:02 AM, Stack <st...@duboce.net> wrote:
> If you are game for deploying an instrumented jar, we could log client
> lookups in .META. and try and figure if it profligate.
> St.Ack
>
> On Wed, Nov 24, 2010 at 9:25 AM, Jack Levin <ma...@gmail.com> wrote:
>> Yes, but that does not alleviate CPU contention should there be too
>> many queries to a single region server.   On a separate topic, is
>> 'compression' in the works for REST gateway?   Similar to
>> mysql_client_compression?  We plan to drop in 500K or more queries at
>> a time into the REST, and it would be interesting to see the
>> performance gain against uncompressed data.
>>
>> Thanks.
>>
>> -Jack
>>
>> On Wed, Nov 24, 2010 at 9:04 AM, Andrew Purtell <ap...@apache.org> wrote:
>>> The REST gateway (Stargate) is a long lived client. :-)
>>>
>>> It uses HTablePool internally so this will keep some warm table references around in addition to the region location caching that HConnectionManager does behind the scenes. (10 references, but this could be made configurable.)
>>>
>>> Best regards,
>>>
>>>    - Andy
>>>
>>> --- On Tue, 11/23/10, Jack Levin <ma...@gmail.com> wrote:
>>>
>>>> From: Jack Levin <ma...@gmail.com>
>>>> Subject: Re: question about meta data query intensity
>>>> To: user@hbase.apache.org
>>>> Date: Tuesday, November 23, 2010, 11:06 AM
>>>> its REST, and generally no long lived
>>>> clients, yes, caching of regions
>>>> helps however, we expect long tail hits that will be
>>>> uncached, which
>>>> may stress out meta region, that being said, is it possible
>>>> create
>>>> affinity and nail meta region into a beefy server or set of
>>>> beefy
>>>> servers?
>>>>
>>>> -Jack
>>>>
>>>> On Tue, Nov 23, 2010 at 10:58 AM, Jonathan Gray <jg...@fb.com>
>>>> wrote:
>>>> > Are you going to have long-lived clients?  How are
>>>> you accessing HBase?  REST or Thrift gateways?  Caching of
>>>> region locations should help significantly so that it's only
>>>> a bottleneck right at the startup of the
>>>> cluster/gateways/clients.
>>>> >
>>>> >> -----Original Message-----
>>>> >> From: Jack Levin [mailto:magnito@gmail.com]
>>>> >> Sent: Tuesday, November 23, 2010 10:53 AM
>>>> >> To: user@hbase.apache.org
>>>> >> Subject: Re: question about meta data query
>>>> intensity
>>>> >>
>>>> >> my concern is that we plane to have 120
>>>> regionservers with 1000
>>>> >> Regions each, so the hits to meta could be quite
>>>> intense.  (why so
>>>> >> many regions? we are storing 1 Petabyte of data of
>>>> images into hbase).
>>>> >>
>>>> >> -Jack
>>>> >>
>>>> >> On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray
>>>> <jg...@fb.com>
>>>> wrote:
>>>> >> > It is possible that it could be a bottleneck
>>>> but usually is not.
>>>> >>  Generally production HBase installations have
>>>> long-lived clients, so
>>>> >> the client-side caching is sufficient to reduce
>>>> the amount of load to
>>>> >> META (virtually 0 clean cluster is at steady-state
>>>> / no region
>>>> >> movement).
>>>> >> >
>>>> >> > For MapReduce, you do make new clients but
>>>> generally only need to
>>>> >> query for one region per task.
>>>> >> >
>>>> >> > It is not currently possible to split META.
>>>>  We hard-coded some stuff
>>>> >> a while back to make things easier and in the name
>>>> of correctness.
>>>> >> >
>>>> >> > HBASE-3171 is about removing the ROOT region
>>>> and putting the META
>>>> >> region(s) locations into ZK directly.  When we
>>>> make that change, we
>>>> >> could probably also re-enable the splitting of
>>>> META.
>>>> >> >
>>>> >> > JG
>>>> >> >
>>>> >> >> -----Original Message-----
>>>> >> >> From: Jack Levin [mailto:magnito@gmail.com]
>>>> >> >> Sent: Tuesday, November 23, 2010 9:31 AM
>>>> >> >> To: user@hbase.apache.org
>>>> >> >> Subject: question about meta data query
>>>> intensity
>>>> >> >>
>>>> >> >> Hello, I am curious if there is a
>>>> potential bottleneck in .META.
>>>> >> >> ownership by a single region server.  Is
>>>> it possible (safe) to split
>>>> >> >> meta region into several?
>>>> >> >>
>>>> >> >> -Jack
>>>> >> >
>>>> >
>>>>
>>>
>>>
>>>
>>>
>>
>

Re: question about meta data query intensity

Posted by Stack <st...@duboce.net>.
If you are game for deploying an instrumented jar, we could log client
lookups in .META. and try and figure if it profligate.
St.Ack

On Wed, Nov 24, 2010 at 9:25 AM, Jack Levin <ma...@gmail.com> wrote:
> Yes, but that does not alleviate CPU contention should there be too
> many queries to a single region server.   On a separate topic, is
> 'compression' in the works for REST gateway?   Similar to
> mysql_client_compression?  We plan to drop in 500K or more queries at
> a time into the REST, and it would be interesting to see the
> performance gain against uncompressed data.
>
> Thanks.
>
> -Jack
>
> On Wed, Nov 24, 2010 at 9:04 AM, Andrew Purtell <ap...@apache.org> wrote:
>> The REST gateway (Stargate) is a long lived client. :-)
>>
>> It uses HTablePool internally so this will keep some warm table references around in addition to the region location caching that HConnectionManager does behind the scenes. (10 references, but this could be made configurable.)
>>
>> Best regards,
>>
>>    - Andy
>>
>> --- On Tue, 11/23/10, Jack Levin <ma...@gmail.com> wrote:
>>
>>> From: Jack Levin <ma...@gmail.com>
>>> Subject: Re: question about meta data query intensity
>>> To: user@hbase.apache.org
>>> Date: Tuesday, November 23, 2010, 11:06 AM
>>> its REST, and generally no long lived
>>> clients, yes, caching of regions
>>> helps however, we expect long tail hits that will be
>>> uncached, which
>>> may stress out meta region, that being said, is it possible
>>> create
>>> affinity and nail meta region into a beefy server or set of
>>> beefy
>>> servers?
>>>
>>> -Jack
>>>
>>> On Tue, Nov 23, 2010 at 10:58 AM, Jonathan Gray <jg...@fb.com>
>>> wrote:
>>> > Are you going to have long-lived clients?  How are
>>> you accessing HBase?  REST or Thrift gateways?  Caching of
>>> region locations should help significantly so that it's only
>>> a bottleneck right at the startup of the
>>> cluster/gateways/clients.
>>> >
>>> >> -----Original Message-----
>>> >> From: Jack Levin [mailto:magnito@gmail.com]
>>> >> Sent: Tuesday, November 23, 2010 10:53 AM
>>> >> To: user@hbase.apache.org
>>> >> Subject: Re: question about meta data query
>>> intensity
>>> >>
>>> >> my concern is that we plane to have 120
>>> regionservers with 1000
>>> >> Regions each, so the hits to meta could be quite
>>> intense.  (why so
>>> >> many regions? we are storing 1 Petabyte of data of
>>> images into hbase).
>>> >>
>>> >> -Jack
>>> >>
>>> >> On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray
>>> <jg...@fb.com>
>>> wrote:
>>> >> > It is possible that it could be a bottleneck
>>> but usually is not.
>>> >>  Generally production HBase installations have
>>> long-lived clients, so
>>> >> the client-side caching is sufficient to reduce
>>> the amount of load to
>>> >> META (virtually 0 clean cluster is at steady-state
>>> / no region
>>> >> movement).
>>> >> >
>>> >> > For MapReduce, you do make new clients but
>>> generally only need to
>>> >> query for one region per task.
>>> >> >
>>> >> > It is not currently possible to split META.
>>>  We hard-coded some stuff
>>> >> a while back to make things easier and in the name
>>> of correctness.
>>> >> >
>>> >> > HBASE-3171 is about removing the ROOT region
>>> and putting the META
>>> >> region(s) locations into ZK directly.  When we
>>> make that change, we
>>> >> could probably also re-enable the splitting of
>>> META.
>>> >> >
>>> >> > JG
>>> >> >
>>> >> >> -----Original Message-----
>>> >> >> From: Jack Levin [mailto:magnito@gmail.com]
>>> >> >> Sent: Tuesday, November 23, 2010 9:31 AM
>>> >> >> To: user@hbase.apache.org
>>> >> >> Subject: question about meta data query
>>> intensity
>>> >> >>
>>> >> >> Hello, I am curious if there is a
>>> potential bottleneck in .META.
>>> >> >> ownership by a single region server.  Is
>>> it possible (safe) to split
>>> >> >> meta region into several?
>>> >> >>
>>> >> >> -Jack
>>> >> >
>>> >
>>>
>>
>>
>>
>>
>

REST compression support (was Re: question about meta data query intensity)

Posted by Andrew Purtell <ap...@apache.org>.
Regards compressing the HTTP transactions between the REST server and REST client we punted on this back when Stargate had a WAR target so we could push that off to the servlet container configuration. Thanks for the question, which reminded me... I have just committed HBASE-3275, which is a trivial patch to support Accept-Encoding: gzip,deflate

Index: src/main/java/org/apache/hadoop/hbase/rest/Main.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/rest/Main.java        (revision 1038732)
+++ src/main/java/org/apache/hadoop/hbase/rest/Main.java        (working copy)
@@ -37,6 +37,7 @@
 import org.mortbay.jetty.Server;
 import org.mortbay.jetty.servlet.Context;
 import org.mortbay.jetty.servlet.ServletHolder;
+import org.mortbay.servlet.GzipFilter;
 
 import com.sun.jersey.spi.container.servlet.ServletContainer;
 
@@ -132,6 +133,7 @@
       // set up context
     Context context = new Context(server, "/", Context.SESSIONS);
     context.addServlet(sh, "/*");
+    context.addFilter(GzipFilter.class, "/*", 0);
 
     server.start();
     server.join();

Regards interactions between HBase client and server, there is no option available for compressing Hadoop RPC.

  - Andy


--- On Wed, 11/24/10, Jack Levin <ma...@gmail.com> wrote:

> From: Jack Levin <ma...@gmail.com>
> Subject: Re: question about meta data query intensity
> To: user@hbase.apache.org, apurtell@apache.org
> Date: Wednesday, November 24, 2010, 9:25 AM
>
> Yes, but that does not alleviate CPU contention should there be too
> many queries to a single region server.   On a separate topic, is
> 'compression' in the works for REST gateway?   Similar to
> mysql_client_compression?  We plan to drop in 500K or
> more queries at a time into the REST, and it would be interesting
> to see the performance gain against uncompressed data.
> 
> Thanks.
> 
> -Jack



      

Re: question about meta data query intensity

Posted by Jack Levin <ma...@gmail.com>.
Yes, but that does not alleviate CPU contention should there be too
many queries to a single region server.   On a separate topic, is
'compression' in the works for REST gateway?   Similar to
mysql_client_compression?  We plan to drop in 500K or more queries at
a time into the REST, and it would be interesting to see the
performance gain against uncompressed data.

Thanks.

-Jack

On Wed, Nov 24, 2010 at 9:04 AM, Andrew Purtell <ap...@apache.org> wrote:
> The REST gateway (Stargate) is a long lived client. :-)
>
> It uses HTablePool internally so this will keep some warm table references around in addition to the region location caching that HConnectionManager does behind the scenes. (10 references, but this could be made configurable.)
>
> Best regards,
>
>    - Andy
>
> --- On Tue, 11/23/10, Jack Levin <ma...@gmail.com> wrote:
>
>> From: Jack Levin <ma...@gmail.com>
>> Subject: Re: question about meta data query intensity
>> To: user@hbase.apache.org
>> Date: Tuesday, November 23, 2010, 11:06 AM
>> its REST, and generally no long lived
>> clients, yes, caching of regions
>> helps however, we expect long tail hits that will be
>> uncached, which
>> may stress out meta region, that being said, is it possible
>> create
>> affinity and nail meta region into a beefy server or set of
>> beefy
>> servers?
>>
>> -Jack
>>
>> On Tue, Nov 23, 2010 at 10:58 AM, Jonathan Gray <jg...@fb.com>
>> wrote:
>> > Are you going to have long-lived clients?  How are
>> you accessing HBase?  REST or Thrift gateways?  Caching of
>> region locations should help significantly so that it's only
>> a bottleneck right at the startup of the
>> cluster/gateways/clients.
>> >
>> >> -----Original Message-----
>> >> From: Jack Levin [mailto:magnito@gmail.com]
>> >> Sent: Tuesday, November 23, 2010 10:53 AM
>> >> To: user@hbase.apache.org
>> >> Subject: Re: question about meta data query
>> intensity
>> >>
>> >> my concern is that we plane to have 120
>> regionservers with 1000
>> >> Regions each, so the hits to meta could be quite
>> intense.  (why so
>> >> many regions? we are storing 1 Petabyte of data of
>> images into hbase).
>> >>
>> >> -Jack
>> >>
>> >> On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray
>> <jg...@fb.com>
>> wrote:
>> >> > It is possible that it could be a bottleneck
>> but usually is not.
>> >>  Generally production HBase installations have
>> long-lived clients, so
>> >> the client-side caching is sufficient to reduce
>> the amount of load to
>> >> META (virtually 0 clean cluster is at steady-state
>> / no region
>> >> movement).
>> >> >
>> >> > For MapReduce, you do make new clients but
>> generally only need to
>> >> query for one region per task.
>> >> >
>> >> > It is not currently possible to split META.
>>  We hard-coded some stuff
>> >> a while back to make things easier and in the name
>> of correctness.
>> >> >
>> >> > HBASE-3171 is about removing the ROOT region
>> and putting the META
>> >> region(s) locations into ZK directly.  When we
>> make that change, we
>> >> could probably also re-enable the splitting of
>> META.
>> >> >
>> >> > JG
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Jack Levin [mailto:magnito@gmail.com]
>> >> >> Sent: Tuesday, November 23, 2010 9:31 AM
>> >> >> To: user@hbase.apache.org
>> >> >> Subject: question about meta data query
>> intensity
>> >> >>
>> >> >> Hello, I am curious if there is a
>> potential bottleneck in .META.
>> >> >> ownership by a single region server.  Is
>> it possible (safe) to split
>> >> >> meta region into several?
>> >> >>
>> >> >> -Jack
>> >> >
>> >
>>
>
>
>
>

Re: question about meta data query intensity

Posted by Andrew Purtell <ap...@apache.org>.
The REST gateway (Stargate) is a long lived client. :-)

It uses HTablePool internally so this will keep some warm table references around in addition to the region location caching that HConnectionManager does behind the scenes. (10 references, but this could be made configurable.)

Best regards,

    - Andy

--- On Tue, 11/23/10, Jack Levin <ma...@gmail.com> wrote:

> From: Jack Levin <ma...@gmail.com>
> Subject: Re: question about meta data query intensity
> To: user@hbase.apache.org
> Date: Tuesday, November 23, 2010, 11:06 AM
> its REST, and generally no long lived
> clients, yes, caching of regions
> helps however, we expect long tail hits that will be
> uncached, which
> may stress out meta region, that being said, is it possible
> create
> affinity and nail meta region into a beefy server or set of
> beefy
> servers?
> 
> -Jack
> 
> On Tue, Nov 23, 2010 at 10:58 AM, Jonathan Gray <jg...@fb.com>
> wrote:
> > Are you going to have long-lived clients?  How are
> you accessing HBase?  REST or Thrift gateways?  Caching of
> region locations should help significantly so that it's only
> a bottleneck right at the startup of the
> cluster/gateways/clients.
> >
> >> -----Original Message-----
> >> From: Jack Levin [mailto:magnito@gmail.com]
> >> Sent: Tuesday, November 23, 2010 10:53 AM
> >> To: user@hbase.apache.org
> >> Subject: Re: question about meta data query
> intensity
> >>
> >> my concern is that we plane to have 120
> regionservers with 1000
> >> Regions each, so the hits to meta could be quite
> intense.  (why so
> >> many regions? we are storing 1 Petabyte of data of
> images into hbase).
> >>
> >> -Jack
> >>
> >> On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray
> <jg...@fb.com>
> wrote:
> >> > It is possible that it could be a bottleneck
> but usually is not.
> >>  Generally production HBase installations have
> long-lived clients, so
> >> the client-side caching is sufficient to reduce
> the amount of load to
> >> META (virtually 0 clean cluster is at steady-state
> / no region
> >> movement).
> >> >
> >> > For MapReduce, you do make new clients but
> generally only need to
> >> query for one region per task.
> >> >
> >> > It is not currently possible to split META.
>  We hard-coded some stuff
> >> a while back to make things easier and in the name
> of correctness.
> >> >
> >> > HBASE-3171 is about removing the ROOT region
> and putting the META
> >> region(s) locations into ZK directly.  When we
> make that change, we
> >> could probably also re-enable the splitting of
> META.
> >> >
> >> > JG
> >> >
> >> >> -----Original Message-----
> >> >> From: Jack Levin [mailto:magnito@gmail.com]
> >> >> Sent: Tuesday, November 23, 2010 9:31 AM
> >> >> To: user@hbase.apache.org
> >> >> Subject: question about meta data query
> intensity
> >> >>
> >> >> Hello, I am curious if there is a
> potential bottleneck in .META.
> >> >> ownership by a single region server.  Is
> it possible (safe) to split
> >> >> meta region into several?
> >> >>
> >> >> -Jack
> >> >
> >
> 


      

Re: question about meta data query intensity

Posted by Jean-Daniel Cryans <jd...@apache.org>.
0.89 is ok, 0.90 is still going through the RCs process. I was asking
because it's a lot different in the new master.

With 10 minutes some things will happen more slowly... like cleaning
the split parents. Also after a region server dies, it will take some
time until all the regions are assigned depending on the status of the
sleep. There might be other things, but I'd have to test to find them
out.

J-D

On Tue, Nov 23, 2010 at 4:28 PM, Jack Levin <ma...@gmail.com> wrote:
> if I set it higher, say to 10 minutes, will there be an potential ill effects?
>
>  -Jack
>
> On Tue, Nov 23, 2010 at 4:24 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> Jack, you didn't upgrade to 0.90 yet right? Then there's a master
>> background thread that scans .META. every minute... but with that
>> amount of rows it's probably best to set that much higher. The
>> config's name is hbase.master.meta.thread.rescanfrequency
>>
>> You should also take a look at your master log to see how long it's
>> taking to scan the whole thing currently. On one cluster here I have:
>>
>> 2010-11-23 16:22:21,307 INFO
>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
>> scanning meta region {server: 10.10.21.40:60020, regionname:
>> .META.,,1.1028785192, startKey: <>}
>> 2010-11-23 16:22:25,129 INFO
>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
>> scan of 7355 row(s) of meta region {server: 10.10.21.40:60020,
>> regionname: .META.,,1.1028785192, startKey: <>} complete
>>
>> Meaning that it took ~4 seconds to scan 7355 rows.
>>
>> J-D
>>
>> On Tue, Nov 23, 2010 at 4:15 PM, Jack Levin <ma...@gmail.com> wrote:
>>> its requests=6204 ... but we have not been loading cluster with
>>> queries at all.  I see that CPU is about 35% used vs other boxes at
>>> user cpu of 10% or so... So its really CPU load that worries me than
>>> the IO.
>>>
>>> -Jack
>>>
>>> On Tue, Nov 23, 2010 at 1:55 PM, Stack <st...@duboce.net> wrote:
>>>> On Tue, Nov 23, 2010 at 11:06 AM, Jack Levin <ma...@gmail.com> wrote:
>>>>> its REST, and generally no long lived clients, yes, caching of regions
>>>>> helps however, we expect long tail hits that will be uncached, which
>>>>> may stress out meta region, that being said, is it possible create
>>>>> affinity and nail meta region into a beefy server or set of beefy
>>>>> servers?
>>>>>
>>>>
>>>> The REST server should be caching region locations for you.
>>>>
>>>> On the .META. side, since its accessed so frequently, it should be
>>>> nailed into the block cache but if 1000 regions sitting beside that
>>>> .META. there could be contention.
>>>>
>>>> There is also hbase.client.prefetch.limit, the number of region
>>>> locations to fetch every time we do a lookup into .META. Currently its
>>>> set to 10.  You could try setting this down to 1?
>>>>
>>>> What are you seeing for request rates and load on the .META. hosting
>>>> regionserver?
>>>>
>>>> St.Ack
>>>>
>>>
>>
>

Re: question about meta data query intensity

Posted by Jack Levin <ma...@gmail.com>.
on 0.89 still...

On Tue, Nov 23, 2010 at 4:28 PM, Jack Levin <ma...@gmail.com> wrote:
> if I set it higher, say to 10 minutes, will there be an potential ill effects?
>
>  -Jack
>
> On Tue, Nov 23, 2010 at 4:24 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> Jack, you didn't upgrade to 0.90 yet right? Then there's a master
>> background thread that scans .META. every minute... but with that
>> amount of rows it's probably best to set that much higher. The
>> config's name is hbase.master.meta.thread.rescanfrequency
>>
>> You should also take a look at your master log to see how long it's
>> taking to scan the whole thing currently. On one cluster here I have:
>>
>> 2010-11-23 16:22:21,307 INFO
>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
>> scanning meta region {server: 10.10.21.40:60020, regionname:
>> .META.,,1.1028785192, startKey: <>}
>> 2010-11-23 16:22:25,129 INFO
>> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
>> scan of 7355 row(s) of meta region {server: 10.10.21.40:60020,
>> regionname: .META.,,1.1028785192, startKey: <>} complete
>>
>> Meaning that it took ~4 seconds to scan 7355 rows.
>>
>> J-D
>>
>> On Tue, Nov 23, 2010 at 4:15 PM, Jack Levin <ma...@gmail.com> wrote:
>>> its requests=6204 ... but we have not been loading cluster with
>>> queries at all.  I see that CPU is about 35% used vs other boxes at
>>> user cpu of 10% or so... So its really CPU load that worries me than
>>> the IO.
>>>
>>> -Jack
>>>
>>> On Tue, Nov 23, 2010 at 1:55 PM, Stack <st...@duboce.net> wrote:
>>>> On Tue, Nov 23, 2010 at 11:06 AM, Jack Levin <ma...@gmail.com> wrote:
>>>>> its REST, and generally no long lived clients, yes, caching of regions
>>>>> helps however, we expect long tail hits that will be uncached, which
>>>>> may stress out meta region, that being said, is it possible create
>>>>> affinity and nail meta region into a beefy server or set of beefy
>>>>> servers?
>>>>>
>>>>
>>>> The REST server should be caching region locations for you.
>>>>
>>>> On the .META. side, since its accessed so frequently, it should be
>>>> nailed into the block cache but if 1000 regions sitting beside that
>>>> .META. there could be contention.
>>>>
>>>> There is also hbase.client.prefetch.limit, the number of region
>>>> locations to fetch every time we do a lookup into .META. Currently its
>>>> set to 10.  You could try setting this down to 1?
>>>>
>>>> What are you seeing for request rates and load on the .META. hosting
>>>> regionserver?
>>>>
>>>> St.Ack
>>>>
>>>
>>
>

Re: question about meta data query intensity

Posted by Jack Levin <ma...@gmail.com>.
if I set it higher, say to 10 minutes, will there be an potential ill effects?

 -Jack

On Tue, Nov 23, 2010 at 4:24 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Jack, you didn't upgrade to 0.90 yet right? Then there's a master
> background thread that scans .META. every minute... but with that
> amount of rows it's probably best to set that much higher. The
> config's name is hbase.master.meta.thread.rescanfrequency
>
> You should also take a look at your master log to see how long it's
> taking to scan the whole thing currently. On one cluster here I have:
>
> 2010-11-23 16:22:21,307 INFO
> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
> scanning meta region {server: 10.10.21.40:60020, regionname:
> .META.,,1.1028785192, startKey: <>}
> 2010-11-23 16:22:25,129 INFO
> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
> scan of 7355 row(s) of meta region {server: 10.10.21.40:60020,
> regionname: .META.,,1.1028785192, startKey: <>} complete
>
> Meaning that it took ~4 seconds to scan 7355 rows.
>
> J-D
>
> On Tue, Nov 23, 2010 at 4:15 PM, Jack Levin <ma...@gmail.com> wrote:
>> its requests=6204 ... but we have not been loading cluster with
>> queries at all.  I see that CPU is about 35% used vs other boxes at
>> user cpu of 10% or so... So its really CPU load that worries me than
>> the IO.
>>
>> -Jack
>>
>> On Tue, Nov 23, 2010 at 1:55 PM, Stack <st...@duboce.net> wrote:
>>> On Tue, Nov 23, 2010 at 11:06 AM, Jack Levin <ma...@gmail.com> wrote:
>>>> its REST, and generally no long lived clients, yes, caching of regions
>>>> helps however, we expect long tail hits that will be uncached, which
>>>> may stress out meta region, that being said, is it possible create
>>>> affinity and nail meta region into a beefy server or set of beefy
>>>> servers?
>>>>
>>>
>>> The REST server should be caching region locations for you.
>>>
>>> On the .META. side, since its accessed so frequently, it should be
>>> nailed into the block cache but if 1000 regions sitting beside that
>>> .META. there could be contention.
>>>
>>> There is also hbase.client.prefetch.limit, the number of region
>>> locations to fetch every time we do a lookup into .META. Currently its
>>> set to 10.  You could try setting this down to 1?
>>>
>>> What are you seeing for request rates and load on the .META. hosting
>>> regionserver?
>>>
>>> St.Ack
>>>
>>
>

Re: question about meta data query intensity

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Jack, you didn't upgrade to 0.90 yet right? Then there's a master
background thread that scans .META. every minute... but with that
amount of rows it's probably best to set that much higher. The
config's name is hbase.master.meta.thread.rescanfrequency

You should also take a look at your master log to see how long it's
taking to scan the whole thing currently. On one cluster here I have:

2010-11-23 16:22:21,307 INFO
org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scanning meta region {server: 10.10.21.40:60020, regionname:
.META.,,1.1028785192, startKey: <>}
2010-11-23 16:22:25,129 INFO
org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scan of 7355 row(s) of meta region {server: 10.10.21.40:60020,
regionname: .META.,,1.1028785192, startKey: <>} complete

Meaning that it took ~4 seconds to scan 7355 rows.

J-D

On Tue, Nov 23, 2010 at 4:15 PM, Jack Levin <ma...@gmail.com> wrote:
> its requests=6204 ... but we have not been loading cluster with
> queries at all.  I see that CPU is about 35% used vs other boxes at
> user cpu of 10% or so... So its really CPU load that worries me than
> the IO.
>
> -Jack
>
> On Tue, Nov 23, 2010 at 1:55 PM, Stack <st...@duboce.net> wrote:
>> On Tue, Nov 23, 2010 at 11:06 AM, Jack Levin <ma...@gmail.com> wrote:
>>> its REST, and generally no long lived clients, yes, caching of regions
>>> helps however, we expect long tail hits that will be uncached, which
>>> may stress out meta region, that being said, is it possible create
>>> affinity and nail meta region into a beefy server or set of beefy
>>> servers?
>>>
>>
>> The REST server should be caching region locations for you.
>>
>> On the .META. side, since its accessed so frequently, it should be
>> nailed into the block cache but if 1000 regions sitting beside that
>> .META. there could be contention.
>>
>> There is also hbase.client.prefetch.limit, the number of region
>> locations to fetch every time we do a lookup into .META. Currently its
>> set to 10.  You could try setting this down to 1?
>>
>> What are you seeing for request rates and load on the .META. hosting
>> regionserver?
>>
>> St.Ack
>>
>

Re: question about meta data query intensity

Posted by Jack Levin <ma...@gmail.com>.
its taking some queries, but at 3% rate of what we expect to give it later.

-Jack

On Tue, Nov 23, 2010 at 4:19 PM, Stack <st...@duboce.net> wrote:
> To be clear, the cluster is not taking queries and .META. is still
> being hit at rate of 6k/second?
> St.Ack
>
> On Tue, Nov 23, 2010 at 4:15 PM, Jack Levin <ma...@gmail.com> wrote:
>> its requests=6204 ... but we have not been loading cluster with
>> queries at all.  I see that CPU is about 35% used vs other boxes at
>> user cpu of 10% or so... So its really CPU load that worries me than
>> the IO.
>>
>> -Jack
>>
>> On Tue, Nov 23, 2010 at 1:55 PM, Stack <st...@duboce.net> wrote:
>>> On Tue, Nov 23, 2010 at 11:06 AM, Jack Levin <ma...@gmail.com> wrote:
>>>> its REST, and generally no long lived clients, yes, caching of regions
>>>> helps however, we expect long tail hits that will be uncached, which
>>>> may stress out meta region, that being said, is it possible create
>>>> affinity and nail meta region into a beefy server or set of beefy
>>>> servers?
>>>>
>>>
>>> The REST server should be caching region locations for you.
>>>
>>> On the .META. side, since its accessed so frequently, it should be
>>> nailed into the block cache but if 1000 regions sitting beside that
>>> .META. there could be contention.
>>>
>>> There is also hbase.client.prefetch.limit, the number of region
>>> locations to fetch every time we do a lookup into .META. Currently its
>>> set to 10.  You could try setting this down to 1?
>>>
>>> What are you seeing for request rates and load on the .META. hosting
>>> regionserver?
>>>
>>> St.Ack
>>>
>>
>

Re: question about meta data query intensity

Posted by Stack <st...@duboce.net>.
To be clear, the cluster is not taking queries and .META. is still
being hit at rate of 6k/second?
St.Ack

On Tue, Nov 23, 2010 at 4:15 PM, Jack Levin <ma...@gmail.com> wrote:
> its requests=6204 ... but we have not been loading cluster with
> queries at all.  I see that CPU is about 35% used vs other boxes at
> user cpu of 10% or so... So its really CPU load that worries me than
> the IO.
>
> -Jack
>
> On Tue, Nov 23, 2010 at 1:55 PM, Stack <st...@duboce.net> wrote:
>> On Tue, Nov 23, 2010 at 11:06 AM, Jack Levin <ma...@gmail.com> wrote:
>>> its REST, and generally no long lived clients, yes, caching of regions
>>> helps however, we expect long tail hits that will be uncached, which
>>> may stress out meta region, that being said, is it possible create
>>> affinity and nail meta region into a beefy server or set of beefy
>>> servers?
>>>
>>
>> The REST server should be caching region locations for you.
>>
>> On the .META. side, since its accessed so frequently, it should be
>> nailed into the block cache but if 1000 regions sitting beside that
>> .META. there could be contention.
>>
>> There is also hbase.client.prefetch.limit, the number of region
>> locations to fetch every time we do a lookup into .META. Currently its
>> set to 10.  You could try setting this down to 1?
>>
>> What are you seeing for request rates and load on the .META. hosting
>> regionserver?
>>
>> St.Ack
>>
>

Re: question about meta data query intensity

Posted by Jack Levin <ma...@gmail.com>.
its requests=6204 ... but we have not been loading cluster with
queries at all.  I see that CPU is about 35% used vs other boxes at
user cpu of 10% or so... So its really CPU load that worries me than
the IO.

-Jack

On Tue, Nov 23, 2010 at 1:55 PM, Stack <st...@duboce.net> wrote:
> On Tue, Nov 23, 2010 at 11:06 AM, Jack Levin <ma...@gmail.com> wrote:
>> its REST, and generally no long lived clients, yes, caching of regions
>> helps however, we expect long tail hits that will be uncached, which
>> may stress out meta region, that being said, is it possible create
>> affinity and nail meta region into a beefy server or set of beefy
>> servers?
>>
>
> The REST server should be caching region locations for you.
>
> On the .META. side, since its accessed so frequently, it should be
> nailed into the block cache but if 1000 regions sitting beside that
> .META. there could be contention.
>
> There is also hbase.client.prefetch.limit, the number of region
> locations to fetch every time we do a lookup into .META. Currently its
> set to 10.  You could try setting this down to 1?
>
> What are you seeing for request rates and load on the .META. hosting
> regionserver?
>
> St.Ack
>

Re: question about meta data query intensity

Posted by Stack <st...@duboce.net>.
On Tue, Nov 23, 2010 at 11:06 AM, Jack Levin <ma...@gmail.com> wrote:
> its REST, and generally no long lived clients, yes, caching of regions
> helps however, we expect long tail hits that will be uncached, which
> may stress out meta region, that being said, is it possible create
> affinity and nail meta region into a beefy server or set of beefy
> servers?
>

The REST server should be caching region locations for you.

On the .META. side, since its accessed so frequently, it should be
nailed into the block cache but if 1000 regions sitting beside that
.META. there could be contention.

There is also hbase.client.prefetch.limit, the number of region
locations to fetch every time we do a lookup into .META. Currently its
set to 10.  You could try setting this down to 1?

What are you seeing for request rates and load on the .META. hosting
regionserver?

St.Ack

RE: question about meta data query intensity

Posted by Jonathan Gray <jg...@fb.com>.
Not today.

If this is of serious concern to you, I'd say drop some comments into HBASE-3171 and I can look at doing that sooner than later on 0.92.  But that is still a medium-term fix as it'll take a little time to stabilize that big of a change.  And I'd like to drop ROOT first and stabilize that before we supported multiple META regions.

With REST, you actually have long-lived clients because the REST server stays up.  I'm not sure if the REST servers do any pre-fetching of META but you could also do that which will pre-load all region info and locations into the client cache.

JG

> -----Original Message-----
> From: Jack Levin [mailto:magnito@gmail.com]
> Sent: Tuesday, November 23, 2010 11:07 AM
> To: user@hbase.apache.org
> Subject: Re: question about meta data query intensity
> 
> its REST, and generally no long lived clients, yes, caching of regions
> helps however, we expect long tail hits that will be uncached, which
> may stress out meta region, that being said, is it possible create
> affinity and nail meta region into a beefy server or set of beefy
> servers?
> 
> -Jack
> 
> On Tue, Nov 23, 2010 at 10:58 AM, Jonathan Gray <jg...@fb.com> wrote:
> > Are you going to have long-lived clients?  How are you accessing
> HBase?  REST or Thrift gateways?  Caching of region locations should
> help significantly so that it's only a bottleneck right at the startup
> of the cluster/gateways/clients.
> >
> >> -----Original Message-----
> >> From: Jack Levin [mailto:magnito@gmail.com]
> >> Sent: Tuesday, November 23, 2010 10:53 AM
> >> To: user@hbase.apache.org
> >> Subject: Re: question about meta data query intensity
> >>
> >> my concern is that we plane to have 120 regionservers with 1000
> >> Regions each, so the hits to meta could be quite intense.  (why so
> >> many regions? we are storing 1 Petabyte of data of images into
> hbase).
> >>
> >> -Jack
> >>
> >> On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray <jg...@fb.com> wrote:
> >> > It is possible that it could be a bottleneck but usually is not.
> >>  Generally production HBase installations have long-lived clients,
> so
> >> the client-side caching is sufficient to reduce the amount of load
> to
> >> META (virtually 0 clean cluster is at steady-state / no region
> >> movement).
> >> >
> >> > For MapReduce, you do make new clients but generally only need to
> >> query for one region per task.
> >> >
> >> > It is not currently possible to split META.  We hard-coded some
> stuff
> >> a while back to make things easier and in the name of correctness.
> >> >
> >> > HBASE-3171 is about removing the ROOT region and putting the META
> >> region(s) locations into ZK directly.  When we make that change, we
> >> could probably also re-enable the splitting of META.
> >> >
> >> > JG
> >> >
> >> >> -----Original Message-----
> >> >> From: Jack Levin [mailto:magnito@gmail.com]
> >> >> Sent: Tuesday, November 23, 2010 9:31 AM
> >> >> To: user@hbase.apache.org
> >> >> Subject: question about meta data query intensity
> >> >>
> >> >> Hello, I am curious if there is a potential bottleneck in .META.
> >> >> ownership by a single region server.  Is it possible (safe) to
> split
> >> >> meta region into several?
> >> >>
> >> >> -Jack
> >> >
> >

Re: question about meta data query intensity

Posted by Jack Levin <ma...@gmail.com>.
its REST, and generally no long lived clients, yes, caching of regions
helps however, we expect long tail hits that will be uncached, which
may stress out meta region, that being said, is it possible create
affinity and nail meta region into a beefy server or set of beefy
servers?

-Jack

On Tue, Nov 23, 2010 at 10:58 AM, Jonathan Gray <jg...@fb.com> wrote:
> Are you going to have long-lived clients?  How are you accessing HBase?  REST or Thrift gateways?  Caching of region locations should help significantly so that it's only a bottleneck right at the startup of the cluster/gateways/clients.
>
>> -----Original Message-----
>> From: Jack Levin [mailto:magnito@gmail.com]
>> Sent: Tuesday, November 23, 2010 10:53 AM
>> To: user@hbase.apache.org
>> Subject: Re: question about meta data query intensity
>>
>> my concern is that we plane to have 120 regionservers with 1000
>> Regions each, so the hits to meta could be quite intense.  (why so
>> many regions? we are storing 1 Petabyte of data of images into hbase).
>>
>> -Jack
>>
>> On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray <jg...@fb.com> wrote:
>> > It is possible that it could be a bottleneck but usually is not.
>>  Generally production HBase installations have long-lived clients, so
>> the client-side caching is sufficient to reduce the amount of load to
>> META (virtually 0 clean cluster is at steady-state / no region
>> movement).
>> >
>> > For MapReduce, you do make new clients but generally only need to
>> query for one region per task.
>> >
>> > It is not currently possible to split META.  We hard-coded some stuff
>> a while back to make things easier and in the name of correctness.
>> >
>> > HBASE-3171 is about removing the ROOT region and putting the META
>> region(s) locations into ZK directly.  When we make that change, we
>> could probably also re-enable the splitting of META.
>> >
>> > JG
>> >
>> >> -----Original Message-----
>> >> From: Jack Levin [mailto:magnito@gmail.com]
>> >> Sent: Tuesday, November 23, 2010 9:31 AM
>> >> To: user@hbase.apache.org
>> >> Subject: question about meta data query intensity
>> >>
>> >> Hello, I am curious if there is a potential bottleneck in .META.
>> >> ownership by a single region server.  Is it possible (safe) to split
>> >> meta region into several?
>> >>
>> >> -Jack
>> >
>

RE: question about meta data query intensity

Posted by Jonathan Gray <jg...@fb.com>.
Are you going to have long-lived clients?  How are you accessing HBase?  REST or Thrift gateways?  Caching of region locations should help significantly so that it's only a bottleneck right at the startup of the cluster/gateways/clients.

> -----Original Message-----
> From: Jack Levin [mailto:magnito@gmail.com]
> Sent: Tuesday, November 23, 2010 10:53 AM
> To: user@hbase.apache.org
> Subject: Re: question about meta data query intensity
> 
> my concern is that we plane to have 120 regionservers with 1000
> Regions each, so the hits to meta could be quite intense.  (why so
> many regions? we are storing 1 Petabyte of data of images into hbase).
> 
> -Jack
> 
> On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray <jg...@fb.com> wrote:
> > It is possible that it could be a bottleneck but usually is not.
>  Generally production HBase installations have long-lived clients, so
> the client-side caching is sufficient to reduce the amount of load to
> META (virtually 0 clean cluster is at steady-state / no region
> movement).
> >
> > For MapReduce, you do make new clients but generally only need to
> query for one region per task.
> >
> > It is not currently possible to split META.  We hard-coded some stuff
> a while back to make things easier and in the name of correctness.
> >
> > HBASE-3171 is about removing the ROOT region and putting the META
> region(s) locations into ZK directly.  When we make that change, we
> could probably also re-enable the splitting of META.
> >
> > JG
> >
> >> -----Original Message-----
> >> From: Jack Levin [mailto:magnito@gmail.com]
> >> Sent: Tuesday, November 23, 2010 9:31 AM
> >> To: user@hbase.apache.org
> >> Subject: question about meta data query intensity
> >>
> >> Hello, I am curious if there is a potential bottleneck in .META.
> >> ownership by a single region server.  Is it possible (safe) to split
> >> meta region into several?
> >>
> >> -Jack
> >

Re: question about meta data query intensity

Posted by Jack Levin <ma...@gmail.com>.
my concern is that we plane to have 120 regionservers with 1000
Regions each, so the hits to meta could be quite intense.  (why so
many regions? we are storing 1 Petabyte of data of images into hbase).

-Jack

On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray <jg...@fb.com> wrote:
> It is possible that it could be a bottleneck but usually is not.  Generally production HBase installations have long-lived clients, so the client-side caching is sufficient to reduce the amount of load to META (virtually 0 clean cluster is at steady-state / no region movement).
>
> For MapReduce, you do make new clients but generally only need to query for one region per task.
>
> It is not currently possible to split META.  We hard-coded some stuff a while back to make things easier and in the name of correctness.
>
> HBASE-3171 is about removing the ROOT region and putting the META region(s) locations into ZK directly.  When we make that change, we could probably also re-enable the splitting of META.
>
> JG
>
>> -----Original Message-----
>> From: Jack Levin [mailto:magnito@gmail.com]
>> Sent: Tuesday, November 23, 2010 9:31 AM
>> To: user@hbase.apache.org
>> Subject: question about meta data query intensity
>>
>> Hello, I am curious if there is a potential bottleneck in .META.
>> ownership by a single region server.  Is it possible (safe) to split
>> meta region into several?
>>
>> -Jack
>

Re: question about meta data query intensity

Posted by Ted Yu <yu...@gmail.com>.
We're facing the loss of /hbase/root-region-server ZNode:

2010-11-23 17:49:11,288 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply
sessionid:0x12c79bef0c10012, packet:: clientPath:null serverPath:null
finished:false header:: 160,4  replyHeader:: 160,62,-101  request::
'/hbase/root-region-server,F  response::
2010-11-23 17:49:11,288 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to read:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for /hbase/root-region-server

How do I get it back ?

Thanks

On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray <jg...@fb.com> wrote:

> It is possible that it could be a bottleneck but usually is not.  Generally
> production HBase installations have long-lived clients, so the client-side
> caching is sufficient to reduce the amount of load to META (virtually 0
> clean cluster is at steady-state / no region movement).
>
> For MapReduce, you do make new clients but generally only need to query for
> one region per task.
>
> It is not currently possible to split META.  We hard-coded some stuff a
> while back to make things easier and in the name of correctness.
>
> HBASE-3171 is about removing the ROOT region and putting the META region(s)
> locations into ZK directly.  When we make that change, we could probably
> also re-enable the splitting of META.
>
> JG
>
> > -----Original Message-----
> > From: Jack Levin [mailto:magnito@gmail.com]
> > Sent: Tuesday, November 23, 2010 9:31 AM
> > To: user@hbase.apache.org
> > Subject: question about meta data query intensity
> >
> > Hello, I am curious if there is a potential bottleneck in .META.
> > ownership by a single region server.  Is it possible (safe) to split
> > meta region into several?
> >
> > -Jack
>

RE: question about meta data query intensity

Posted by Jonathan Gray <jg...@fb.com>.
It is possible that it could be a bottleneck but usually is not.  Generally production HBase installations have long-lived clients, so the client-side caching is sufficient to reduce the amount of load to META (virtually 0 clean cluster is at steady-state / no region movement).

For MapReduce, you do make new clients but generally only need to query for one region per task.

It is not currently possible to split META.  We hard-coded some stuff a while back to make things easier and in the name of correctness.

HBASE-3171 is about removing the ROOT region and putting the META region(s) locations into ZK directly.  When we make that change, we could probably also re-enable the splitting of META.

JG

> -----Original Message-----
> From: Jack Levin [mailto:magnito@gmail.com]
> Sent: Tuesday, November 23, 2010 9:31 AM
> To: user@hbase.apache.org
> Subject: question about meta data query intensity
> 
> Hello, I am curious if there is a potential bottleneck in .META.
> ownership by a single region server.  Is it possible (safe) to split
> meta region into several?
> 
> -Jack