You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Tao Xie <xi...@gmail.com> on 2010/10/12 03:43:22 UTC

Question regarding data location in hdfs after hbase restarts

hi, all
I set hdfs replica=1 when running hbase. And DN and RS co-exists on each
slave node. So the data in the regions managed by RS will be stored on its
local data node, rite?
But when I restart hbase and hbase client does gets on RS, datanode will
read data from remote data nodes. Does that mean when RS restart, the
regions are re-arranged? If so, will hbase is clever enough to re-adjust the
regions? I'm not clear about the behind mechanism so anyone can give me some
explanations? Thanks.

Re: Question regarding data location in hdfs after hbase restarts

Posted by Stack <st...@duboce.net>.
When you write HDFS, you write N replicas.  By default, the first
replica is written to the local datanode.  Reading, the DFSClient will
try to read from the most local replica first.

Compactions read from multiple files and write out a single merged
file.  This newly written files' blocks will all be on the local
datanode unless anomaly.

St.Ack

On Tue, Oct 12, 2010 at 11:58 AM, Jack Levin <ma...@gmail.com> wrote:
> Ryan, can you elaborate how compactions create data locality?
>
> -Jack
>
>
> On Oct 11, 2010, at 10:12 PM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> We don't attempt to optimize region placement with hdfs locations yet. A
>> reason why is because on a long lived cluster compactions create the
>> locality you are looking for. Furthermore, in the old master such an
>> optimization was really hard to do. The new master should make it easier to
>> write such 1 off hacks.
>> On Oct 11, 2010 9:43 PM, "Tao Xie" <xi...@gmail.com> wrote:
>>> hi, all
>>> I set hdfs replica=1 when running hbase. And DN and RS co-exists on each
>>> slave node. So the data in the regions managed by RS will be stored on its
>>> local data node, rite?
>>> But when I restart hbase and hbase client does gets on RS, datanode will
>>> read data from remote data nodes. Does that mean when RS restart, the
>>> regions are re-arranged? If so, will hbase is clever enough to re-adjust
>> the
>>> regions? I'm not clear about the behind mechanism so anyone can give me
>> some
>>> explanations? Thanks.
>

Re: Question regarding data location in hdfs after hbase restarts

Posted by Jack Levin <ma...@gmail.com>.
Ryan, can you elaborate how compactions create data locality?

-Jack


On Oct 11, 2010, at 10:12 PM, Ryan Rawson <ry...@gmail.com> wrote:

> We don't attempt to optimize region placement with hdfs locations yet. A
> reason why is because on a long lived cluster compactions create the
> locality you are looking for. Furthermore, in the old master such an
> optimization was really hard to do. The new master should make it easier to
> write such 1 off hacks.
> On Oct 11, 2010 9:43 PM, "Tao Xie" <xi...@gmail.com> wrote:
>> hi, all
>> I set hdfs replica=1 when running hbase. And DN and RS co-exists on each
>> slave node. So the data in the regions managed by RS will be stored on its
>> local data node, rite?
>> But when I restart hbase and hbase client does gets on RS, datanode will
>> read data from remote data nodes. Does that mean when RS restart, the
>> regions are re-arranged? If so, will hbase is clever enough to re-adjust
> the
>> regions? I'm not clear about the behind mechanism so anyone can give me
> some
>> explanations? Thanks.

Re: Question regarding data location in hdfs after hbase restarts

Posted by Ryan Rawson <ry...@gmail.com>.
We don't attempt to optimize region placement with hdfs locations yet. A
reason why is because on a long lived cluster compactions create the
locality you are looking for. Furthermore, in the old master such an
optimization was really hard to do. The new master should make it easier to
write such 1 off hacks.
On Oct 11, 2010 9:43 PM, "Tao Xie" <xi...@gmail.com> wrote:
> hi, all
> I set hdfs replica=1 when running hbase. And DN and RS co-exists on each
> slave node. So the data in the regions managed by RS will be stored on its
> local data node, rite?
> But when I restart hbase and hbase client does gets on RS, datanode will
> read data from remote data nodes. Does that mean when RS restart, the
> regions are re-arranged? If so, will hbase is clever enough to re-adjust
the
> regions? I'm not clear about the behind mechanism so anyone can give me
some
> explanations? Thanks.