You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Vladimir Rodionov <vr...@carrieriq.com> on 2013/07/29 23:36:51 UTC

Wrong RPS stats?

I am running simple performance test to evaluate maximum read IO we can get from 1 RS. The client tells me 480 RPS,
the RS stats (from HBase UI) gives me totally different number:
 
xxx.xxx.com,60020,1375123451578	Mon Jul 29 18:44:11 GMT+00:00 2013	requestsPerSecond=11684, numberOfOnlineRegions=398, usedHeapMB=70, maxHeapMB=5987 

11684 RPS. 

HBase 0.94.6.1 (CDH4.3)

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Monday, July 29, 2013 10:26 AM
To: dev@hbase.apache.org; lars hofhansl
Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

We have 20 RS and table was initially pre-split into 20 regions.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Sunday, July 28, 2013 10:39 PM
To: dev@hbase.apache.org
Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

It starts with smaller regions and then ramps up the size as the table in question is growing in size.
Makes make for a more even distribution of regions.

Into how many region did you presplit your table? How many region servers do you have available in this cluster?

Maybe that number was less than what that policy estimated to be a good number of regions for your cluster.

-- Lars



----- Original Message -----
From: Vladimir Rodionov <vr...@carrieriq.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>
Cc:
Sent: Sunday, July 28, 2013 10:25 PM
Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

Thanks, Ted. Are there  any rationals behind  IncreasingToUpperBoundRegionSplitPolicy ?
Why is it better than ConstantSizeRegionSplitPolicy?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ted Yu [yuzhihong@gmail.com]
Sent: Sunday, July 28, 2013 8:52 PM
To: dev@hbase.apache.org
Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

In 0.94 there're several split policies available:

./src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java

From
./src/main/java/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.java
:

* @see IncreasingToUpperBoundRegionSplitPolicy Default split policy since
*      0.94.0
* @see ConstantSizeRegionSplitPolicy Default split policy before 0.94.0

Cheers

On Sun, Jul 28, 2013 at 8:39 PM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Yes, I pre-split the table
>
> Out of 109 regions only 3 are empty (wrong assumption on key distribution),
> If we split region on 10GB we will have > 500GB in 109 regions - not 376GB
> I do not understand how does region splitting algorithm work.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Sunday, July 28, 2013 5:50 PM
> To: dev@hbase.apache.org
> Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
>
> "Yes it works, of course." It's not working for me ;) so was not sure.
>
> It's normal to have regions under the half of the MAX_FILESIZE. When a
> regions is more than MAX_FILESIZE it's splitted in 2. So one can be more,
> and the other one can be less.
>
> I will say, average 5GB will have been a good value, but even 3.6 is still
> not so bad.
>
> Have you pre-splitted the regions initially? Is it possible that you have
> not-used pre-splitted regions?
>
> You can you Hannibal to have a quick view of what the sizes are
>
> JM
>
> 2013/7/28 Vladimir Rodionov <vr...@carrieriq.com>
>
> > The final stats:
> >
> > Total HDFS size - 376GB
> > #regions: 109 - avg. region size ~ 3.6GB
> >
> > Something is wrong here. I expected fewer regions. The regions get split
> > at sizes much lower than
> > hbase.hregion.max.filesize and/or  MAX_FILESIZE.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Vladimir Rodionov
> > Sent: Sunday, July 28, 2013 3:35 PM
> > To: dev@hbase.apache.org
> > Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> >
> > Yes it works, of course.
> >
> > Its in original post - ~ 10gB
> >
> > <property>
> > <name>hbase.hregion.max.filesize</name>
> >    <value>10737418240</value>
> >    <source>hbase-site.xml</source>
> > </property>
> >
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> > Sent: Sunday, July 28, 2013 2:30 PM
> > To: dev@hbase.apache.org
> > Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> >
> > Hi Vladimir,
> >
> > Is this link working for you? http://MASTERURL:60010/conf ? If yes, what
> > do
> > you have for hbase.hregion.max.filesize? To make sure the property below
> is
> > considerered.
> >
> > For the table config, did you get it from the webui?
> >
> > JM
> >
> > 2013/7/28 Vladimir Rodionov <vr...@carrieriq.com>
> >
> > > but all regions keep getting split at 1Gb
> > >
> > > I have 71 regions and 70GB of data in 'usertable' despite the fact
> that:
> > >
> > > table config is:
> > > {NAME => 'usertable', DEFERRED_LOG_FLUSH => 'true', MAX_FILESIZE =>
> > > '10000000000', FAMILIES => [{NAME => 'cf', BLOOMFILTER => 'ROWCOL',
> > > VERSIONS => '1', COMPRESSION => 'GZ'}]}
> > >
> > > and hbase-size.xml has the following config:
> > >
> > > <property>
> > >    <name>hbase.hregion.max.filesize</name>
> > >    <value>10737418240</value>
> > >    <source>hbase-site.xml</source>
> > > </property>
> > >
> > > HBase 0.94.6
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or Notifications@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Wrong RPS stats?

Posted by Ted Yu <yu...@gmail.com>.
bq. The client tells me 480 RPS

Were you using ycsb as client ?

On Mon, Jul 29, 2013 at 2:49 PM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Nope, I am doing simple table.get(get) - not a batch request
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: lars hofhansl [larsh@apache.org]
> Sent: Monday, July 29, 2013 2:39 PM
> To: dev@hbase.apache.org
> Subject: Re: Wrong RPS stats?
>
> Are you bulk requests put(List<Put>), etc? If so you get X RPCs and X*N
> requests if the list has N items on average.
> I admit this is a bit confusing.
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Vladimir Rodionov <vr...@carrieriq.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Cc:
> Sent: Monday, July 29, 2013 2:36 PM
> Subject: Wrong RPS stats?
>
> I am running simple performance test to evaluate maximum read IO we can
> get from 1 RS. The client tells me 480 RPS,
> the RS stats (from HBase UI) gives me totally different number:
>
> xxx.xxx.com,60020,1375123451578    Mon Jul 29 18:44:11 GMT+00:00 2013
>  requestsPerSecond=11684, numberOfOnlineRegions=398, usedHeapMB=70,
> maxHeapMB=5987
>
> 11684 RPS.
>
> HBase 0.94.6.1 (CDH4.3)
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Vladimir Rodionov
> Sent: Monday, July 29, 2013 10:26 AM
> To: dev@hbase.apache.org; lars hofhansl
> Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
>
> We have 20 RS and table was initially pre-split into 20 regions.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: lars hofhansl [larsh@apache.org]
> Sent: Sunday, July 28, 2013 10:39 PM
> To: dev@hbase.apache.org
> Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
>
> It starts with smaller regions and then ramps up the size as the table in
> question is growing in size.
> Makes make for a more even distribution of regions.
>
> Into how many region did you presplit your table? How many region servers
> do you have available in this cluster?
>
> Maybe that number was less than what that policy estimated to be a good
> number of regions for your cluster.
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Vladimir Rodionov <vr...@carrieriq.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Cc:
> Sent: Sunday, July 28, 2013 10:25 PM
> Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
>
> Thanks, Ted. Are there  any rationals behind
>  IncreasingToUpperBoundRegionSplitPolicy ?
> Why is it better than ConstantSizeRegionSplitPolicy?
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Ted Yu [yuzhihong@gmail.com]
> Sent: Sunday, July 28, 2013 8:52 PM
> To: dev@hbase.apache.org
> Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
>
> In 0.94 there're several split policies available:
>
>
> ./src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
>
> ./src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
>
> ./src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java
>
> ./src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
>
> ./src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java
>
> From
> ./src/main/java/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.java
> :
>
> * @see IncreasingToUpperBoundRegionSplitPolicy Default split policy since
> *      0.94.0
> * @see ConstantSizeRegionSplitPolicy Default split policy before 0.94.0
>
> Cheers
>
> On Sun, Jul 28, 2013 at 8:39 PM, Vladimir Rodionov
> <vr...@carrieriq.com>wrote:
>
> > Yes, I pre-split the table
> >
> > Out of 109 regions only 3 are empty (wrong assumption on key
> distribution),
> > If we split region on 10GB we will have > 500GB in 109 regions - not
> 376GB
> > I do not understand how does region splitting algorithm work.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> > Sent: Sunday, July 28, 2013 5:50 PM
> > To: dev@hbase.apache.org
> > Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> >
> > "Yes it works, of course." It's not working for me ;) so was not sure.
> >
> > It's normal to have regions under the half of the MAX_FILESIZE. When a
> > regions is more than MAX_FILESIZE it's splitted in 2. So one can be more,
> > and the other one can be less.
> >
> > I will say, average 5GB will have been a good value, but even 3.6 is
> still
> > not so bad.
> >
> > Have you pre-splitted the regions initially? Is it possible that you have
> > not-used pre-splitted regions?
> >
> > You can you Hannibal to have a quick view of what the sizes are
> >
> > JM
> >
> > 2013/7/28 Vladimir Rodionov <vr...@carrieriq.com>
> >
> > > The final stats:
> > >
> > > Total HDFS size - 376GB
> > > #regions: 109 - avg. region size ~ 3.6GB
> > >
> > > Something is wrong here. I expected fewer regions. The regions get
> split
> > > at sizes much lower than
> > > hbase.hregion.max.filesize and/or  MAX_FILESIZE.
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > ________________________________________
> > > From: Vladimir Rodionov
> > > Sent: Sunday, July 28, 2013 3:35 PM
> > > To: dev@hbase.apache.org
> > > Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> > >
> > > Yes it works, of course.
> > >
> > > Its in original post - ~ 10gB
> > >
> > > <property>
> > > <name>hbase.hregion.max.filesize</name>
> > >    <value>10737418240</value>
> > >    <source>hbase-site.xml</source>
> > > </property>
> > >
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > ________________________________________
> > > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> > > Sent: Sunday, July 28, 2013 2:30 PM
> > > To: dev@hbase.apache.org
> > > Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> > >
> > > Hi Vladimir,
> > >
> > > Is this link working for you? http://MASTERURL:60010/conf ? If yes,
> what
> > > do
> > > you have for hbase.hregion.max.filesize? To make sure the property
> below
> > is
> > > considerered.
> > >
> > > For the table config, did you get it from the webui?
> > >
> > > JM
> > >
> > > 2013/7/28 Vladimir Rodionov <vr...@carrieriq.com>
> > >
> > > > but all regions keep getting split at 1Gb
> > > >
> > > > I have 71 regions and 70GB of data in 'usertable' despite the fact
> > that:
> > > >
> > > > table config is:
> > > > {NAME => 'usertable', DEFERRED_LOG_FLUSH => 'true', MAX_FILESIZE =>
> > > > '10000000000', FAMILIES => [{NAME => 'cf', BLOOMFILTER => 'ROWCOL',
> > > > VERSIONS => '1', COMPRESSION => 'GZ'}]}
> > > >
> > > > and hbase-size.xml has the following config:
> > > >
> > > > <property>
> > > >    <name>hbase.hregion.max.filesize</name>
> > > >    <value>10737418240</value>
> > > >    <source>hbase-site.xml</source>
> > > > </property>
> > > >
> > > > HBase 0.94.6
> > > >
> > > > Best regards,
> > > > Vladimir Rodionov
> > > > Principal Platform Engineer
> > > > Carrier IQ, www.carrieriq.com
> > > > e-mail: vrodionov@carrieriq.com
> > > >
> > > > Confidentiality Notice:  The information contained in this message,
> > > > including any attachments hereto, may be confidential and is intended
> > to
> > > be
> > > > read only by the individual or entity to whom this message is
> > addressed.
> > > If
> > > > the reader of this message is not the intended recipient or an agent
> or
> > > > designee of the intended recipient, please note that any review, use,
> > > > disclosure or distribution of this message or its attachments, in any
> > > form,
> > > > is strictly prohibited.  If you have received this message in error,
> > > please
> > > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > > delete or destroy any copy of this message and its attachments.
> > > >
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or Notifications@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

RE: Wrong RPS stats?

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Nope, I am doing simple table.get(get) - not a batch request

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Monday, July 29, 2013 2:39 PM
To: dev@hbase.apache.org
Subject: Re: Wrong RPS stats?

Are you bulk requests put(List<Put>), etc? If so you get X RPCs and X*N requests if the list has N items on average.
I admit this is a bit confusing.

-- Lars



----- Original Message -----
From: Vladimir Rodionov <vr...@carrieriq.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>
Cc:
Sent: Monday, July 29, 2013 2:36 PM
Subject: Wrong RPS stats?

I am running simple performance test to evaluate maximum read IO we can get from 1 RS. The client tells me 480 RPS,
the RS stats (from HBase UI) gives me totally different number:

xxx.xxx.com,60020,1375123451578    Mon Jul 29 18:44:11 GMT+00:00 2013    requestsPerSecond=11684, numberOfOnlineRegions=398, usedHeapMB=70, maxHeapMB=5987

11684 RPS.

HBase 0.94.6.1 (CDH4.3)

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Monday, July 29, 2013 10:26 AM
To: dev@hbase.apache.org; lars hofhansl
Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

We have 20 RS and table was initially pre-split into 20 regions.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Sunday, July 28, 2013 10:39 PM
To: dev@hbase.apache.org
Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

It starts with smaller regions and then ramps up the size as the table in question is growing in size.
Makes make for a more even distribution of regions.

Into how many region did you presplit your table? How many region servers do you have available in this cluster?

Maybe that number was less than what that policy estimated to be a good number of regions for your cluster.

-- Lars



----- Original Message -----
From: Vladimir Rodionov <vr...@carrieriq.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>
Cc:
Sent: Sunday, July 28, 2013 10:25 PM
Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

Thanks, Ted. Are there  any rationals behind  IncreasingToUpperBoundRegionSplitPolicy ?
Why is it better than ConstantSizeRegionSplitPolicy?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ted Yu [yuzhihong@gmail.com]
Sent: Sunday, July 28, 2013 8:52 PM
To: dev@hbase.apache.org
Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

In 0.94 there're several split policies available:

./src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java

From
./src/main/java/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.java
:

* @see IncreasingToUpperBoundRegionSplitPolicy Default split policy since
*      0.94.0
* @see ConstantSizeRegionSplitPolicy Default split policy before 0.94.0

Cheers

On Sun, Jul 28, 2013 at 8:39 PM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Yes, I pre-split the table
>
> Out of 109 regions only 3 are empty (wrong assumption on key distribution),
> If we split region on 10GB we will have > 500GB in 109 regions - not 376GB
> I do not understand how does region splitting algorithm work.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Sunday, July 28, 2013 5:50 PM
> To: dev@hbase.apache.org
> Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
>
> "Yes it works, of course." It's not working for me ;) so was not sure.
>
> It's normal to have regions under the half of the MAX_FILESIZE. When a
> regions is more than MAX_FILESIZE it's splitted in 2. So one can be more,
> and the other one can be less.
>
> I will say, average 5GB will have been a good value, but even 3.6 is still
> not so bad.
>
> Have you pre-splitted the regions initially? Is it possible that you have
> not-used pre-splitted regions?
>
> You can you Hannibal to have a quick view of what the sizes are
>
> JM
>
> 2013/7/28 Vladimir Rodionov <vr...@carrieriq.com>
>
> > The final stats:
> >
> > Total HDFS size - 376GB
> > #regions: 109 - avg. region size ~ 3.6GB
> >
> > Something is wrong here. I expected fewer regions. The regions get split
> > at sizes much lower than
> > hbase.hregion.max.filesize and/or  MAX_FILESIZE.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Vladimir Rodionov
> > Sent: Sunday, July 28, 2013 3:35 PM
> > To: dev@hbase.apache.org
> > Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> >
> > Yes it works, of course.
> >
> > Its in original post - ~ 10gB
> >
> > <property>
> > <name>hbase.hregion.max.filesize</name>
> >    <value>10737418240</value>
> >    <source>hbase-site.xml</source>
> > </property>
> >
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> > Sent: Sunday, July 28, 2013 2:30 PM
> > To: dev@hbase.apache.org
> > Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> >
> > Hi Vladimir,
> >
> > Is this link working for you? http://MASTERURL:60010/conf ? If yes, what
> > do
> > you have for hbase.hregion.max.filesize? To make sure the property below
> is
> > considerered.
> >
> > For the table config, did you get it from the webui?
> >
> > JM
> >
> > 2013/7/28 Vladimir Rodionov <vr...@carrieriq.com>
> >
> > > but all regions keep getting split at 1Gb
> > >
> > > I have 71 regions and 70GB of data in 'usertable' despite the fact
> that:
> > >
> > > table config is:
> > > {NAME => 'usertable', DEFERRED_LOG_FLUSH => 'true', MAX_FILESIZE =>
> > > '10000000000', FAMILIES => [{NAME => 'cf', BLOOMFILTER => 'ROWCOL',
> > > VERSIONS => '1', COMPRESSION => 'GZ'}]}
> > >
> > > and hbase-size.xml has the following config:
> > >
> > > <property>
> > >    <name>hbase.hregion.max.filesize</name>
> > >    <value>10737418240</value>
> > >    <source>hbase-site.xml</source>
> > > </property>
> > >
> > > HBase 0.94.6
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or Notifications@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Wrong RPS stats?

Posted by lars hofhansl <la...@apache.org>.
Are you bulk requests put(List<Put>), etc? If so you get X RPCs and X*N requests if the list has N items on average.
I admit this is a bit confusing.

-- Lars



----- Original Message -----
From: Vladimir Rodionov <vr...@carrieriq.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>
Cc: 
Sent: Monday, July 29, 2013 2:36 PM
Subject: Wrong RPS stats?

I am running simple performance test to evaluate maximum read IO we can get from 1 RS. The client tells me 480 RPS,
the RS stats (from HBase UI) gives me totally different number:

xxx.xxx.com,60020,1375123451578    Mon Jul 29 18:44:11 GMT+00:00 2013    requestsPerSecond=11684, numberOfOnlineRegions=398, usedHeapMB=70, maxHeapMB=5987 

11684 RPS. 

HBase 0.94.6.1 (CDH4.3)

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Monday, July 29, 2013 10:26 AM
To: dev@hbase.apache.org; lars hofhansl
Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

We have 20 RS and table was initially pre-split into 20 regions.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: lars hofhansl [larsh@apache.org]
Sent: Sunday, July 28, 2013 10:39 PM
To: dev@hbase.apache.org
Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

It starts with smaller regions and then ramps up the size as the table in question is growing in size.
Makes make for a more even distribution of regions.

Into how many region did you presplit your table? How many region servers do you have available in this cluster?

Maybe that number was less than what that policy estimated to be a good number of regions for your cluster.

-- Lars



----- Original Message -----
From: Vladimir Rodionov <vr...@carrieriq.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>
Cc:
Sent: Sunday, July 28, 2013 10:25 PM
Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

Thanks, Ted. Are there  any rationals behind  IncreasingToUpperBoundRegionSplitPolicy ?
Why is it better than ConstantSizeRegionSplitPolicy?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ted Yu [yuzhihong@gmail.com]
Sent: Sunday, July 28, 2013 8:52 PM
To: dev@hbase.apache.org
Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb

In 0.94 there're several split policies available:

./src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
./src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java

From
./src/main/java/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.java
:

* @see IncreasingToUpperBoundRegionSplitPolicy Default split policy since
*      0.94.0
* @see ConstantSizeRegionSplitPolicy Default split policy before 0.94.0

Cheers

On Sun, Jul 28, 2013 at 8:39 PM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Yes, I pre-split the table
>
> Out of 109 regions only 3 are empty (wrong assumption on key distribution),
> If we split region on 10GB we will have > 500GB in 109 regions - not 376GB
> I do not understand how does region splitting algorithm work.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Sunday, July 28, 2013 5:50 PM
> To: dev@hbase.apache.org
> Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
>
> "Yes it works, of course." It's not working for me ;) so was not sure.
>
> It's normal to have regions under the half of the MAX_FILESIZE. When a
> regions is more than MAX_FILESIZE it's splitted in 2. So one can be more,
> and the other one can be less.
>
> I will say, average 5GB will have been a good value, but even 3.6 is still
> not so bad.
>
> Have you pre-splitted the regions initially? Is it possible that you have
> not-used pre-splitted regions?
>
> You can you Hannibal to have a quick view of what the sizes are
>
> JM
>
> 2013/7/28 Vladimir Rodionov <vr...@carrieriq.com>
>
> > The final stats:
> >
> > Total HDFS size - 376GB
> > #regions: 109 - avg. region size ~ 3.6GB
> >
> > Something is wrong here. I expected fewer regions. The regions get split
> > at sizes much lower than
> > hbase.hregion.max.filesize and/or  MAX_FILESIZE.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Vladimir Rodionov
> > Sent: Sunday, July 28, 2013 3:35 PM
> > To: dev@hbase.apache.org
> > Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> >
> > Yes it works, of course.
> >
> > Its in original post - ~ 10gB
> >
> > <property>
> > <name>hbase.hregion.max.filesize</name>
> >    <value>10737418240</value>
> >    <source>hbase-site.xml</source>
> > </property>
> >
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> > Sent: Sunday, July 28, 2013 2:30 PM
> > To: dev@hbase.apache.org
> > Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> >
> > Hi Vladimir,
> >
> > Is this link working for you? http://MASTERURL:60010/conf ? If yes, what
> > do
> > you have for hbase.hregion.max.filesize? To make sure the property below
> is
> > considerered.
> >
> > For the table config, did you get it from the webui?
> >
> > JM
> >
> > 2013/7/28 Vladimir Rodionov <vr...@carrieriq.com>
> >
> > > but all regions keep getting split at 1Gb
> > >
> > > I have 71 regions and 70GB of data in 'usertable' despite the fact
> that:
> > >
> > > table config is:
> > > {NAME => 'usertable', DEFERRED_LOG_FLUSH => 'true', MAX_FILESIZE =>
> > > '10000000000', FAMILIES => [{NAME => 'cf', BLOOMFILTER => 'ROWCOL',
> > > VERSIONS => '1', COMPRESSION => 'GZ'}]}
> > >
> > > and hbase-size.xml has the following config:
> > >
> > > <property>
> > >    <name>hbase.hregion.max.filesize</name>
> > >    <value>10737418240</value>
> > >    <source>hbase-site.xml</source>
> > > </property>
> > >
> > > HBase 0.94.6
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or Notifications@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.