You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Sever Fundatureanu <fu...@gmail.com> on 2012/07/01 22:22:46 UTC

Re: Finding the correct region server

Hello,

You can find a great explanation of HBase file locality in HDFS here
http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html .
The short answer is that HBase file locality improves with time since files
are recreated during compactions as close as possible to the RS running
node in the order local, rack or remote-rack. So if you don't restart your
cluster for a long time, there is a good chance to find your data locally
on the RS.


On Fri, Jun 29, 2012 at 7:15 PM, Ramchander Varadarajan <
ramsci@yahoo-inc.com> wrote:

> Hi all,
>
> We are evaluating Hbase to store some metadata information on a very large
> scale. As of now, our architecture looks like this.
>
> Machine 1:
>    Runs Client 1
>    Runs Region Server 1
>    Runs Data Node 1
>
> Machine n:
>    Runs Client n
>    Runs Region Server n
>    Runs Data Node n
>
> Now, say, we have only one Region for the data set at the moment and its
> maxing out, and the region is in Region Server 1. If a flood of new
> requests come in to Machine n, and it tries to store the data, will Region
> Server n store it locally on its data node n, or will the requests be
> routed to Region Server 1 and a new region is created there after it splits?
>
> The reason I ask is because I want to see if a Client can be made sticky
> to a region server. That way, if a user with an id 1111 comes in, he will
> be sent to Client 1 all the time, because we know Region Server 1 will have
> his region. We will know that by using his id to figure that out upfront.
> Just trying to minimize the latency further. ( Of course I understand that
> if nodes are down, there will be ways to route the traffic to another host
> to handle the users that fall in that bucket)
>
> thanks in advance
>



-- 
Sever Fundatureanu

Vrije Universiteit Amsterdam