You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by bharath vissapragada <bh...@students.iiit.ac.in> on 2009/07/22 06:43:44 UTC

Data Processing in hbase

Hi all,

I have one simple doubt in hbase ,

Suppose i use a scanner to iterate through all the rows in the hbase and
process the data in the table corresponding to those rows .Is the processing
of that data done locally on the region server in which that particular
region is located or is it transferred over network so that all the
processing is done on a single machine on which that script runs!!

thanks

Re: Data Processing in hbase

Posted by Amandeep Khurana <am...@gmail.com>.

On Wed, Jul 22, 2009 at 12:07 AM, bharath vissapragada <
bharathvissapragada1990@gmail.com> wrote:

> That means we have to stick to the principle of MR whenever we require
> efficient data processing ..
> but map reduce cannot offer solutions to gnrl database problems i guess!
>

I'd recommend you read up the papers on MR, BigTable, and some of the latest
stuff on HadoopDB etc. That'll give you clarity.


>
> On Wed, Jul 22, 2009 at 12:34 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
>
> > On Wed, Jul 22, 2009 at 12:01 AM, bharath vissapragada <
> > bharathvissapragada1990@gmail.com> wrote:
> >
> > > suppose i non MR codes using java API such that it involves pprocessing
> > of
> > > huge data (100s of GBs) .. then is there an overhead of fetching data
> > (such
> > > a huge amnt) from other machines ..?
> >
> >
> > Ofcourse. Network and I/O overheads definitely plague processing large
> > datasets.
> >
> >
> > >
> > >
> > > On Wed, Jul 22, 2009 at 12:24 PM, Amandeep Khurana <am...@gmail.com>
> > > wrote:
> > >
> > > > HBase is meant to store large tables. The intention is to store data
> in
> > a
> > > > way thats more scalable as compared to traditional database systems.
> > Now,
> > > > HBase is built over Hadoop and has the option of being used as the
> data
> > > > store for MR jobs. However, thats not the only purpose.
> > > >
> > > > In all data storage systems (except embedded databases), you would
> have
> > > to
> > > > fetch data to where computation has to be performed. The whole MR
> > design
> > > > philosophy is to take the code to the data and execute it as close to
> > > where
> > > > the data is stored as possible.
> > > >
> > > >
> > > > On Tue, Jul 21, 2009 at 11:48 PM, bharath vissapragada <
> > > > bharathvissapragada1990@gmail.com> wrote:
> > > >
> > > > > That means .. it is not very useful to write java codes (using API)
> >  ..
> > > > > because any way it is not using the real power of
> hadoop(distributed
> > > > > processing) instead it has the overhead of fetching data from other
> > > > > machines
> > > > > right?
> > > > >
> > > > > On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana <
> amansk@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Yes.. Only if you use MR. If you are writing your own code, it'll
> > > pull
> > > > > the
> > > > > > records to the place where you run the code.
> > > > > >
> > > > > > On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <
> > > fern@alum.mit.edu
> > > > > > >wrote:
> > > > > >
> > > > > > > That is if you use Hadoop MapReduce right? Not if you simply
> > access
> > > > > HBase
> > > > > > > through a standard api (like java)?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> > > > > > >
> > > > > > >> Bharath,
> > > > > > >>
> > > > > > >> The processing is done as local to the RS as possible. The
> first
> > > > > attempt
> > > > > > >> is
> > > > > > >> at doing it local on the same node. If thats not possible, its
> > > done
> > > > on
> > > > > > the
> > > > > > >> same rack.
> > > > > > >>
> > > > > > >> -ak
> > > > > > >>
> > > > > > >>
> > > > > > >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> > > > > > >> bharat_v@students.iiit.ac.in>  wrote:
> > > > > > >>
> > > > > > >>  Hi all,
> > > > > > >>>
> > > > > > >>> I have one simple doubt in hbase ,
> > > > > > >>>
> > > > > > >>> Suppose i use a scanner to iterate through all the rows in
> the
> > > > hbase
> > > > > > and
> > > > > > >>> process the data in the table corresponding to those rows .Is
> > the
> > > > > > >>> processing
> > > > > > >>> of that data done locally on the region server in which that
> > > > > particular
> > > > > > >>> region is located or is it transferred over network so that
> all
> > > the
> > > > > > >>> processing is done on a single machine on which that script
> > > runs!!
> > > > > > >>>
> > > > > > >>> thanks
> > > > > > >>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Data Processing in hbase

Posted by bharath vissapragada <bh...@gmail.com>.

That means we have to stick to the principle of MR whenever we require
efficient data processing ..
but map reduce cannot offer solutions to gnrl database problems i guess!

On Wed, Jul 22, 2009 at 12:34 PM, Amandeep Khurana <am...@gmail.com> wrote:

> On Wed, Jul 22, 2009 at 12:01 AM, bharath vissapragada <
> bharathvissapragada1990@gmail.com> wrote:
>
> > suppose i non MR codes using java API such that it involves pprocessing
> of
> > huge data (100s of GBs) .. then is there an overhead of fetching data
> (such
> > a huge amnt) from other machines ..?
>
>
> Ofcourse. Network and I/O overheads definitely plague processing large
> datasets.
>
>
> >
> >
> > On Wed, Jul 22, 2009 at 12:24 PM, Amandeep Khurana <am...@gmail.com>
> > wrote:
> >
> > > HBase is meant to store large tables. The intention is to store data in
> a
> > > way thats more scalable as compared to traditional database systems.
> Now,
> > > HBase is built over Hadoop and has the option of being used as the data
> > > store for MR jobs. However, thats not the only purpose.
> > >
> > > In all data storage systems (except embedded databases), you would have
> > to
> > > fetch data to where computation has to be performed. The whole MR
> design
> > > philosophy is to take the code to the data and execute it as close to
> > where
> > > the data is stored as possible.
> > >
> > >
> > > On Tue, Jul 21, 2009 at 11:48 PM, bharath vissapragada <
> > > bharathvissapragada1990@gmail.com> wrote:
> > >
> > > > That means .. it is not very useful to write java codes (using API)
>  ..
> > > > because any way it is not using the real power of hadoop(distributed
> > > > processing) instead it has the overhead of fetching data from other
> > > > machines
> > > > right?
> > > >
> > > > On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana <amansk@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Yes.. Only if you use MR. If you are writing your own code, it'll
> > pull
> > > > the
> > > > > records to the place where you run the code.
> > > > >
> > > > > On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <
> > fern@alum.mit.edu
> > > > > >wrote:
> > > > >
> > > > > > That is if you use Hadoop MapReduce right? Not if you simply
> access
> > > > HBase
> > > > > > through a standard api (like java)?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> > > > > >
> > > > > >> Bharath,
> > > > > >>
> > > > > >> The processing is done as local to the RS as possible. The first
> > > > attempt
> > > > > >> is
> > > > > >> at doing it local on the same node. If thats not possible, its
> > done
> > > on
> > > > > the
> > > > > >> same rack.
> > > > > >>
> > > > > >> -ak
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> > > > > >> bharat_v@students.iiit.ac.in>  wrote:
> > > > > >>
> > > > > >>  Hi all,
> > > > > >>>
> > > > > >>> I have one simple doubt in hbase ,
> > > > > >>>
> > > > > >>> Suppose i use a scanner to iterate through all the rows in the
> > > hbase
> > > > > and
> > > > > >>> process the data in the table corresponding to those rows .Is
> the
> > > > > >>> processing
> > > > > >>> of that data done locally on the region server in which that
> > > > particular
> > > > > >>> region is located or is it transferred over network so that all
> > the
> > > > > >>> processing is done on a single machine on which that script
> > runs!!
> > > > > >>>
> > > > > >>> thanks
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Data Processing in hbase

Posted by Amandeep Khurana <am...@gmail.com>.

On Wed, Jul 22, 2009 at 12:01 AM, bharath vissapragada <
bharathvissapragada1990@gmail.com> wrote:

> suppose i non MR codes using java API such that it involves pprocessing of
> huge data (100s of GBs) .. then is there an overhead of fetching data (such
> a huge amnt) from other machines ..?


Ofcourse. Network and I/O overheads definitely plague processing large
datasets.


>
>
> On Wed, Jul 22, 2009 at 12:24 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
>
> > HBase is meant to store large tables. The intention is to store data in a
> > way thats more scalable as compared to traditional database systems. Now,
> > HBase is built over Hadoop and has the option of being used as the data
> > store for MR jobs. However, thats not the only purpose.
> >
> > In all data storage systems (except embedded databases), you would have
> to
> > fetch data to where computation has to be performed. The whole MR design
> > philosophy is to take the code to the data and execute it as close to
> where
> > the data is stored as possible.
> >
> >
> > On Tue, Jul 21, 2009 at 11:48 PM, bharath vissapragada <
> > bharathvissapragada1990@gmail.com> wrote:
> >
> > > That means .. it is not very useful to write java codes (using API)  ..
> > > because any way it is not using the real power of hadoop(distributed
> > > processing) instead it has the overhead of fetching data from other
> > > machines
> > > right?
> > >
> > > On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana <am...@gmail.com>
> > > wrote:
> > >
> > > > Yes.. Only if you use MR. If you are writing your own code, it'll
> pull
> > > the
> > > > records to the place where you run the code.
> > > >
> > > > On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <
> fern@alum.mit.edu
> > > > >wrote:
> > > >
> > > > > That is if you use Hadoop MapReduce right? Not if you simply access
> > > HBase
> > > > > through a standard api (like java)?
> > > > >
> > > > >
> > > > >
> > > > > On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> > > > >
> > > > >> Bharath,
> > > > >>
> > > > >> The processing is done as local to the RS as possible. The first
> > > attempt
> > > > >> is
> > > > >> at doing it local on the same node. If thats not possible, its
> done
> > on
> > > > the
> > > > >> same rack.
> > > > >>
> > > > >> -ak
> > > > >>
> > > > >>
> > > > >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> > > > >> bharat_v@students.iiit.ac.in>  wrote:
> > > > >>
> > > > >>  Hi all,
> > > > >>>
> > > > >>> I have one simple doubt in hbase ,
> > > > >>>
> > > > >>> Suppose i use a scanner to iterate through all the rows in the
> > hbase
> > > > and
> > > > >>> process the data in the table corresponding to those rows .Is the
> > > > >>> processing
> > > > >>> of that data done locally on the region server in which that
> > > particular
> > > > >>> region is located or is it transferred over network so that all
> the
> > > > >>> processing is done on a single machine on which that script
> runs!!
> > > > >>>
> > > > >>> thanks
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
>

Re: Data Processing in hbase

Posted by bharath vissapragada <bh...@gmail.com>.

suppose i non MR codes using java API such that it involves pprocessing of
huge data (100s of GBs) .. then is there an overhead of fetching data (such
a huge amnt) from other machines ..?

On Wed, Jul 22, 2009 at 12:24 PM, Amandeep Khurana <am...@gmail.com> wrote:

> HBase is meant to store large tables. The intention is to store data in a
> way thats more scalable as compared to traditional database systems. Now,
> HBase is built over Hadoop and has the option of being used as the data
> store for MR jobs. However, thats not the only purpose.
>
> In all data storage systems (except embedded databases), you would have to
> fetch data to where computation has to be performed. The whole MR design
> philosophy is to take the code to the data and execute it as close to where
> the data is stored as possible.
>
>
> On Tue, Jul 21, 2009 at 11:48 PM, bharath vissapragada <
> bharathvissapragada1990@gmail.com> wrote:
>
> > That means .. it is not very useful to write java codes (using API)  ..
> > because any way it is not using the real power of hadoop(distributed
> > processing) instead it has the overhead of fetching data from other
> > machines
> > right?
> >
> > On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana <am...@gmail.com>
> > wrote:
> >
> > > Yes.. Only if you use MR. If you are writing your own code, it'll pull
> > the
> > > records to the place where you run the code.
> > >
> > > On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <fern@alum.mit.edu
> > > >wrote:
> > >
> > > > That is if you use Hadoop MapReduce right? Not if you simply access
> > HBase
> > > > through a standard api (like java)?
> > > >
> > > >
> > > >
> > > > On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> > > >
> > > >> Bharath,
> > > >>
> > > >> The processing is done as local to the RS as possible. The first
> > attempt
> > > >> is
> > > >> at doing it local on the same node. If thats not possible, its done
> on
> > > the
> > > >> same rack.
> > > >>
> > > >> -ak
> > > >>
> > > >>
> > > >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> > > >> bharat_v@students.iiit.ac.in>  wrote:
> > > >>
> > > >>  Hi all,
> > > >>>
> > > >>> I have one simple doubt in hbase ,
> > > >>>
> > > >>> Suppose i use a scanner to iterate through all the rows in the
> hbase
> > > and
> > > >>> process the data in the table corresponding to those rows .Is the
> > > >>> processing
> > > >>> of that data done locally on the region server in which that
> > particular
> > > >>> region is located or is it transferred over network so that all the
> > > >>> processing is done on a single machine on which that script runs!!
> > > >>>
> > > >>> thanks
> > > >>>
> > > >>>
> > > >>
> > >
> >
>

Re: Data Processing in hbase

Posted by Amandeep Khurana <am...@gmail.com>.

HBase is meant to store large tables. The intention is to store data in a
way thats more scalable as compared to traditional database systems. Now,
HBase is built over Hadoop and has the option of being used as the data
store for MR jobs. However, thats not the only purpose.

In all data storage systems (except embedded databases), you would have to
fetch data to where computation has to be performed. The whole MR design
philosophy is to take the code to the data and execute it as close to where
the data is stored as possible.


On Tue, Jul 21, 2009 at 11:48 PM, bharath vissapragada <
bharathvissapragada1990@gmail.com> wrote:

> That means .. it is not very useful to write java codes (using API)  ..
> because any way it is not using the real power of hadoop(distributed
> processing) instead it has the overhead of fetching data from other
> machines
> right?
>
> On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
>
> > Yes.. Only if you use MR. If you are writing your own code, it'll pull
> the
> > records to the place where you run the code.
> >
> > On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <fern@alum.mit.edu
> > >wrote:
> >
> > > That is if you use Hadoop MapReduce right? Not if you simply access
> HBase
> > > through a standard api (like java)?
> > >
> > >
> > >
> > > On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> > >
> > >> Bharath,
> > >>
> > >> The processing is done as local to the RS as possible. The first
> attempt
> > >> is
> > >> at doing it local on the same node. If thats not possible, its done on
> > the
> > >> same rack.
> > >>
> > >> -ak
> > >>
> > >>
> > >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> > >> bharat_v@students.iiit.ac.in>  wrote:
> > >>
> > >>  Hi all,
> > >>>
> > >>> I have one simple doubt in hbase ,
> > >>>
> > >>> Suppose i use a scanner to iterate through all the rows in the hbase
> > and
> > >>> process the data in the table corresponding to those rows .Is the
> > >>> processing
> > >>> of that data done locally on the region server in which that
> particular
> > >>> region is located or is it transferred over network so that all the
> > >>> processing is done on a single machine on which that script runs!!
> > >>>
> > >>> thanks
> > >>>
> > >>>
> > >>
> >
>

Re: Data Processing in hbase

Posted by bharath vissapragada <bh...@gmail.com>.

I have one more doubt .
In the example given in the site

http://wiki.apache.org/hadoop/Hbase/MapReduce

Some codes are written in such a manner that they have only Map classes ..
and no reduce classes
What i understood is that a MAP is generated for every regionserver and it
operates on the data present in that region server ..
Is this idea right??

On Wed, Jul 22, 2009 at 9:58 PM, Erik Holstad <er...@gmail.com> wrote:

> Hi Bharath!
> One of the main benefits of using HBase is that it gives you random access
> to your data. The main goal is not to
> use it for big batch processing jobs going through all or a lot of your
> data. Even though hooks into MapReduce jobs
> gives you that option.
>
> So when ever you fetch data using get and scan, that data is brought to the
> client, for you to process it there. Using
> HBase as the source or sink in a MR this is not the case.
>
> What access patterns do you have to your data, are you doing a lot of
> random
> reads or mostly batch processing of
> data?
>
> Regards Erik
>

Re: Data Processing in hbase

Posted by Erik Holstad <er...@gmail.com>.

Hi Bharath!
One of the main benefits of using HBase is that it gives you random access
to your data. The main goal is not to
use it for big batch processing jobs going through all or a lot of your
data. Even though hooks into MapReduce jobs
gives you that option.

So when ever you fetch data using get and scan, that data is brought to the
client, for you to process it there. Using
HBase as the source or sink in a MR this is not the case.

What access patterns do you have to your data, are you doing a lot of random
reads or mostly batch processing of
data?

Regards Erik

Re: Data Processing in hbase

Posted by Fernando Padilla <fe...@alum.mit.edu>.

This might be a simplified view, but this is how I understand it..

HBase stores the data in a distributed way, by using various RegionServers.

MapReduce distributes computations, by using TaskTrackers.

So when a MapReduce job is run it tries to run Map/Reduce operations 
close run on TaskTrackers co-located with RegionServers serving the 
data.. thus co-locating the computations with the data..

So, if you use Map/Reduce, you get computation and data distribution by 
default, as well as a best effort to co-locate computation with data, 
thus maxing out efficiency as much as possible.

Now, you don't have to use Map/Reduce, but then you will have to take 
the extra effort to distribute your computations, and try to co-locate 
them close to the data..

That is in fact something that I'm planning on doing, since I'm not sure 
yet if my computations are suited for Map/Reduce.. So I will probably 
run my own Java process co-located with the Hbase RegionServers..  And 
make sure that when my code asks for data, it gets the local data as 
much as possible...

On 7/21/09 11:48 PM, bharath vissapragada wrote:
> That means .. it is not very useful to write java codes (using API)  ..
> because any way it is not using the real power of hadoop(distributed
> processing) instead it has the overhead of fetching data from other machines
> right?
>
> On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana<am...@gmail.com>  wrote:
>
>> Yes.. Only if you use MR. If you are writing your own code, it'll pull the
>> records to the place where you run the code.
>>
>> On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla<fern@alum.mit.edu
>>> wrote:
>>> That is if you use Hadoop MapReduce right? Not if you simply access HBase
>>> through a standard api (like java)?
>>>
>>>
>>>
>>> On 7/21/09 9:49 PM, Amandeep Khurana wrote:
>>>
>>>> Bharath,
>>>>
>>>> The processing is done as local to the RS as possible. The first attempt
>>>> is
>>>> at doing it local on the same node. If thats not possible, its done on
>> the
>>>> same rack.
>>>>
>>>> -ak
>>>>
>>>>
>>>> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
>>>> bharat_v@students.iiit.ac.in>   wrote:
>>>>
>>>>   Hi all,
>>>>> I have one simple doubt in hbase ,
>>>>>
>>>>> Suppose i use a scanner to iterate through all the rows in the hbase
>> and
>>>>> process the data in the table corresponding to those rows .Is the
>>>>> processing
>>>>> of that data done locally on the region server in which that particular
>>>>> region is located or is it transferred over network so that all the
>>>>> processing is done on a single machine on which that script runs!!
>>>>>
>>>>> thanks
>>>>>
>>>>>
>

Re: Data Processing in hbase

Posted by bharath vissapragada <bh...@gmail.com>.

That means .. it is not very useful to write java codes (using API)  ..
because any way it is not using the real power of hadoop(distributed
processing) instead it has the overhead of fetching data from other machines
right?

On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana <am...@gmail.com> wrote:

> Yes.. Only if you use MR. If you are writing your own code, it'll pull the
> records to the place where you run the code.
>
> On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <fern@alum.mit.edu
> >wrote:
>
> > That is if you use Hadoop MapReduce right? Not if you simply access HBase
> > through a standard api (like java)?
> >
> >
> >
> > On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> >
> >> Bharath,
> >>
> >> The processing is done as local to the RS as possible. The first attempt
> >> is
> >> at doing it local on the same node. If thats not possible, its done on
> the
> >> same rack.
> >>
> >> -ak
> >>
> >>
> >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> >> bharat_v@students.iiit.ac.in>  wrote:
> >>
> >>  Hi all,
> >>>
> >>> I have one simple doubt in hbase ,
> >>>
> >>> Suppose i use a scanner to iterate through all the rows in the hbase
> and
> >>> process the data in the table corresponding to those rows .Is the
> >>> processing
> >>> of that data done locally on the region server in which that particular
> >>> region is located or is it transferred over network so that all the
> >>> processing is done on a single machine on which that script runs!!
> >>>
> >>> thanks
> >>>
> >>>
> >>
>

Re: Data Processing in hbase

Posted by Amandeep Khurana <am...@gmail.com>.

Yes.. Only if you use MR. If you are writing your own code, it'll pull the
records to the place where you run the code.

On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <fe...@alum.mit.edu>wrote:

> That is if you use Hadoop MapReduce right? Not if you simply access HBase
> through a standard api (like java)?
>
>
>
> On 7/21/09 9:49 PM, Amandeep Khurana wrote:
>
>> Bharath,
>>
>> The processing is done as local to the RS as possible. The first attempt
>> is
>> at doing it local on the same node. If thats not possible, its done on the
>> same rack.
>>
>> -ak
>>
>>
>> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
>> bharat_v@students.iiit.ac.in>  wrote:
>>
>>  Hi all,
>>>
>>> I have one simple doubt in hbase ,
>>>
>>> Suppose i use a scanner to iterate through all the rows in the hbase and
>>> process the data in the table corresponding to those rows .Is the
>>> processing
>>> of that data done locally on the region server in which that particular
>>> region is located or is it transferred over network so that all the
>>> processing is done on a single machine on which that script runs!!
>>>
>>> thanks
>>>
>>>
>>

Re: Data Processing in hbase

Posted by Fernando Padilla <fe...@alum.mit.edu>.

That is if you use Hadoop MapReduce right? Not if you simply access 
HBase through a standard api (like java)?


On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> Bharath,
>
> The processing is done as local to the RS as possible. The first attempt is
> at doing it local on the same node. If thats not possible, its done on the
> same rack.
>
> -ak
>
>
> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> bharat_v@students.iiit.ac.in>  wrote:
>
>> Hi all,
>>
>> I have one simple doubt in hbase ,
>>
>> Suppose i use a scanner to iterate through all the rows in the hbase and
>> process the data in the table corresponding to those rows .Is the
>> processing
>> of that data done locally on the region server in which that particular
>> region is located or is it transferred over network so that all the
>> processing is done on a single machine on which that script runs!!
>>
>> thanks
>>
>

Re: Data Processing in hbase

Posted by Amandeep Khurana <am...@gmail.com>.

Bharath,

The processing is done as local to the RS as possible. The first attempt is
at doing it local on the same node. If thats not possible, its done on the
same rack.

-ak


On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada <
bharat_v@students.iiit.ac.in> wrote:

> Hi all,
>
> I have one simple doubt in hbase ,
>
> Suppose i use a scanner to iterate through all the rows in the hbase and
> process the data in the table corresponding to those rows .Is the
> processing
> of that data done locally on the region server in which that particular
> region is located or is it transferred over network so that all the
> processing is done on a single machine on which that script runs!!
>
> thanks
>