You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/12/03 16:04:49 UTC

Data Locality, HBase? Or Hadoop?

Hi,

I'm wondering who is taking care of the data locality. Is it hadoop? Or hbase?

Let's say I have disabled the load balancer and I'm manually moving a
region to a specific server. Who is going to take care that the data
is going to be on the same datanode as the regionserver I moved the
region to? Is hadoop going to see that my region is now on this region
server and make sure my data is moved there too? Or is hbase going to
ask hadoop to do it?

Or, since I moved it manually, there is not any data locality guaranteed?

Thanks,

JM

Re: Data Locality, HBase? Or Hadoop?

Posted by Kevin O'dell <ke...@cloudera.com>.
I would argue to running a major compact on the individual region.  Should
be quick seeing as how you are on .94.

On Mon, Dec 3, 2012 at 10:16 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Ok. So each time I move a region manually, I'm better to run a minor
> compaction to make sure data is moved locally too.
>
> Got it.
>
> Thanks,
>
> JM
>
> 2012/12/3, Kevin O'dell <ke...@cloudera.com>:
> > JM,
> >
> >   If you have disabled the balancer and are manually moving regions, you
> > will need to run a compaction on those regions.  That is the
> only(logical)
> > way of bringing the data local.  HDFS does not have a concept of HBase
> > locality.  HBase locality is all managed through major and minor
> > compactions.
> >
> > On Mon, Dec 3, 2012 at 10:04 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Hi,
> >>
> >> I'm wondering who is taking care of the data locality. Is it hadoop? Or
> >> hbase?
> >>
> >> Let's say I have disabled the load balancer and I'm manually moving a
> >> region to a specific server. Who is going to take care that the data
> >> is going to be on the same datanode as the regionserver I moved the
> >> region to? Is hadoop going to see that my region is now on this region
> >> server and make sure my data is moved there too? Or is hbase going to
> >> ask hadoop to do it?
> >>
> >> Or, since I moved it manually, there is not any data locality
> guaranteed?
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
> >
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Data Locality, HBase? Or Hadoop?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Ok. So each time I move a region manually, I'm better to run a minor
compaction to make sure data is moved locally too.

Got it.

Thanks,

JM

2012/12/3, Kevin O'dell <ke...@cloudera.com>:
> JM,
>
>   If you have disabled the balancer and are manually moving regions, you
> will need to run a compaction on those regions.  That is the only(logical)
> way of bringing the data local.  HDFS does not have a concept of HBase
> locality.  HBase locality is all managed through major and minor
> compactions.
>
> On Mon, Dec 3, 2012 at 10:04 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi,
>>
>> I'm wondering who is taking care of the data locality. Is it hadoop? Or
>> hbase?
>>
>> Let's say I have disabled the load balancer and I'm manually moving a
>> region to a specific server. Who is going to take care that the data
>> is going to be on the same datanode as the regionserver I moved the
>> region to? Is hadoop going to see that my region is now on this region
>> server and make sure my data is moved there too? Or is hbase going to
>> ask hadoop to do it?
>>
>> Or, since I moved it manually, there is not any data locality guaranteed?
>>
>> Thanks,
>>
>> JM
>>
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>

RE: Data Locality, HBase? Or Hadoop?

Posted by Anoop Sam John <an...@huawei.com>.
I think all is clear now.. Just to conclude, the data locality is feature provided by HDFS. When DFS client writes some data, hadoop will try to maintain the data locality. HBase region server writes and reads data via the DFS client which is in the same process as that of the RS.  When the flush happens data locality would have been achieved for that data..  Later when the region is getting moved by the balancer or manually, data locality may again be available after a compaction as the compaction will rewrite the data into HDFS again.. (merging many files into 1 HFile)
Major compaction if done all the data will get local..  If it is minor compaction only that much data which are present in the minor compacted files will get moved into a new HFile and thus only that much locality.  :)

-Anoop-
________________________________________
From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
Sent: Monday, December 03, 2012 9:23 PM
To: user@hbase.apache.org
Subject: Re: Data Locality, HBase? Or Hadoop?

Ok. I will try the major compaction then ;)

Doug, thanks for pointing to the doc! I now totally understand why
it's moved locally when the compaction occurs!

Thanks all! I will give that a try very shortly.

JM

2012/12/3, Doug Meil <do...@explorysmedical.com>:
>
> Hi there-
>
> This is also discussed in the Regions section in the RefGuide:
>
> http://hbase.apache.org/book.html#regions.arch
>
> 9.7.3. Region-RegionServer Locality
>
>
>
>
> On 12/3/12 10:08 AM, "Kevin O'dell" <ke...@cloudera.com> wrote:
>
>>JM,
>>
>>  If you have disabled the balancer and are manually moving regions, you
>>will need to run a compaction on those regions.  That is the only(logical)
>>way of bringing the data local.  HDFS does not have a concept of HBase
>>locality.  HBase locality is all managed through major and minor
>>compactions.
>>
>>On Mon, Dec 3, 2012 at 10:04 AM, Jean-Marc Spaggiari <
>>jean-marc@spaggiari.org> wrote:
>>
>>> Hi,
>>>
>>> I'm wondering who is taking care of the data locality. Is it hadoop? Or
>>> hbase?
>>>
>>> Let's say I have disabled the load balancer and I'm manually moving a
>>> region to a specific server. Who is going to take care that the data
>>> is going to be on the same datanode as the regionserver I moved the
>>> region to? Is hadoop going to see that my region is now on this region
>>> server and make sure my data is moved there too? Or is hbase going to
>>> ask hadoop to do it?
>>>
>>> Or, since I moved it manually, there is not any data locality
>>>guaranteed?
>>>
>>> Thanks,
>>>
>>> JM
>>>
>>
>>
>>
>>--
>>Kevin O'Dell
>>Customer Operations Engineer, Cloudera
>
>
>

Re: Data Locality, HBase? Or Hadoop?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Ok. I will try the major compaction then ;)

Doug, thanks for pointing to the doc! I now totally understand why
it's moved locally when the compaction occurs!

Thanks all! I will give that a try very shortly.

JM

2012/12/3, Doug Meil <do...@explorysmedical.com>:
>
> Hi there-
>
> This is also discussed in the Regions section in the RefGuide:
>
> http://hbase.apache.org/book.html#regions.arch
>
> 9.7.3. Region-RegionServer Locality
>
>
>
>
> On 12/3/12 10:08 AM, "Kevin O'dell" <ke...@cloudera.com> wrote:
>
>>JM,
>>
>>  If you have disabled the balancer and are manually moving regions, you
>>will need to run a compaction on those regions.  That is the only(logical)
>>way of bringing the data local.  HDFS does not have a concept of HBase
>>locality.  HBase locality is all managed through major and minor
>>compactions.
>>
>>On Mon, Dec 3, 2012 at 10:04 AM, Jean-Marc Spaggiari <
>>jean-marc@spaggiari.org> wrote:
>>
>>> Hi,
>>>
>>> I'm wondering who is taking care of the data locality. Is it hadoop? Or
>>> hbase?
>>>
>>> Let's say I have disabled the load balancer and I'm manually moving a
>>> region to a specific server. Who is going to take care that the data
>>> is going to be on the same datanode as the regionserver I moved the
>>> region to? Is hadoop going to see that my region is now on this region
>>> server and make sure my data is moved there too? Or is hbase going to
>>> ask hadoop to do it?
>>>
>>> Or, since I moved it manually, there is not any data locality
>>>guaranteed?
>>>
>>> Thanks,
>>>
>>> JM
>>>
>>
>>
>>
>>--
>>Kevin O'Dell
>>Customer Operations Engineer, Cloudera
>
>
>

Re: Data Locality, HBase? Or Hadoop?

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there-

This is also discussed in the Regions section in the RefGuide:

http://hbase.apache.org/book.html#regions.arch

9.7.3. Region-RegionServer Locality




On 12/3/12 10:08 AM, "Kevin O'dell" <ke...@cloudera.com> wrote:

>JM,
>
>  If you have disabled the balancer and are manually moving regions, you
>will need to run a compaction on those regions.  That is the only(logical)
>way of bringing the data local.  HDFS does not have a concept of HBase
>locality.  HBase locality is all managed through major and minor
>compactions.
>
>On Mon, Dec 3, 2012 at 10:04 AM, Jean-Marc Spaggiari <
>jean-marc@spaggiari.org> wrote:
>
>> Hi,
>>
>> I'm wondering who is taking care of the data locality. Is it hadoop? Or
>> hbase?
>>
>> Let's say I have disabled the load balancer and I'm manually moving a
>> region to a specific server. Who is going to take care that the data
>> is going to be on the same datanode as the regionserver I moved the
>> region to? Is hadoop going to see that my region is now on this region
>> server and make sure my data is moved there too? Or is hbase going to
>> ask hadoop to do it?
>>
>> Or, since I moved it manually, there is not any data locality
>>guaranteed?
>>
>> Thanks,
>>
>> JM
>>
>
>
>
>-- 
>Kevin O'Dell
>Customer Operations Engineer, Cloudera



Re: Data Locality, HBase? Or Hadoop?

Posted by Kevin O'dell <ke...@cloudera.com>.
JM,

  If you have disabled the balancer and are manually moving regions, you
will need to run a compaction on those regions.  That is the only(logical)
way of bringing the data local.  HDFS does not have a concept of HBase
locality.  HBase locality is all managed through major and minor
compactions.

On Mon, Dec 3, 2012 at 10:04 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi,
>
> I'm wondering who is taking care of the data locality. Is it hadoop? Or
> hbase?
>
> Let's say I have disabled the load balancer and I'm manually moving a
> region to a specific server. Who is going to take care that the data
> is going to be on the same datanode as the regionserver I moved the
> region to? Is hadoop going to see that my region is now on this region
> server and make sure my data is moved there too? Or is hbase going to
> ask hadoop to do it?
>
> Or, since I moved it manually, there is not any data locality guaranteed?
>
> Thanks,
>
> JM
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera