You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Dalia Sobhy <da...@hotmail.com> on 2012/11/24 18:15:00 UTC

Hbase MapReduce

Dear all,
I wanted to ask a question..
Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions?
Thanks :D

Re: Hbase MapReduce

Posted by Thomas Wendzinski <tw...@arcor.de>.

Hallo,

> It 's weird that hbase aggregate functions don't use MapReduce, this means
> that the performance will be very poor.
> Is it a must to use coprocessors?
> Is there a much easier way to improve the functions' performance ?
Why would performance be poor? I am not dealing a long time with these 
coprocessors and am still testing
a lot, but in my perception its much more lightweight on the other hand.
Actually, depending on your row key design and request range, the load 
is distributed across the region servers.
Each will handle aggregation for its own key range. The client, invoking 
the coprpocessor then must merge the results, which
can be seen as sort of reduce function.

I think one has to precisely think about its own requirements. I am not 
sure how and even if M/R Jobs can work for real-time scenarios.
Here, coprocessors seem to be a good alternative, which could also be 
limited, depending of how many rows you need to iterate in the 
coprocessor for the kind of data you have and expect to request.

However, having the data processed in a M/R job before and persistent 
would be faster for the single client request because you only need to 
fetch the aggregated data that then already exists. But how recent is 
data at this time? Does it change frequently and aggregation results 
must be as recent as the data it reflects? Could be a con against M/R...

Regards
tom



Am 25.11.2012 07:26, schrieb Wei Tan:
> Actually coprocessor can be used to implement MR-like function, while not
> using Hadoop framework.
>
>
>
> Best Regards,
> Wei
>
> Wei Tan
> Research Staff Member
> IBM T. J. Watson Research Center
> Yorktown Heights, NY 10598
> wtan@us.ibm.com; 914-784-6752
>
>
>
> From:   Dalia Sobhy <da...@hotmail.com>
> To:     "user@hbase.apache.org" <us...@hbase.apache.org>,
> Date:   11/24/2012 01:33 PM
> Subject:        RE: Hbase MapReduce
>
>
>
>
> It 's weird that hbase aggregate functions don't use MapReduce, this means
> that the performance will be very poor.
> Is it a must to use coprocessors?
> Is there a much easier way to improve the functions' performance ?
>
>> CC: user@hbase.apache.org
>> From: michael_segel@hotmail.com
>> Subject: Re: Hbase MapReduce
>> Date: Sat, 24 Nov 2012 12:05:45 -0600
>> To: user@hbase.apache.org
>>
>> Do you think it would be a good idea to temper the use of CoProcessors?
>>
>> This kind of reminds me of when people first started using stored
> procedures...
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Nov 24, 2012, at 11:46 AM, tom <tw...@arcor.de> wrote:
>>
>>> Hi, but you do not need to us M/R. You could also use coprocessors.
>>>
>>> See this site:
>>> https://blogs.apache.org/hbase/entry/coprocessor_introduction
>>> -> in the section "Endpoints"
>>>
>>> An aggregation coprocessor ships with hbase that should match your
> requirements.
>>> You just need to load it and eventually you can access it from HTable:
>>>
>>> HTable.coprocessorExec(..) <
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte
> [],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29>
>>> Regards
>>> tom
>>>
>>> Am 24.11.2012 18:32, schrieb Marcos Ortiz:
>>>> Regards, Dalia.
>>>> You have to use MapReduce for that.
>>>> In the HBase in Practice´s book, there are lot of great examples for
> this.
>>>> On 11/24/2012 12:15 PM, Dalia Sobhy wrote:
>>>>> Dear all,
>>>>> I wanted to ask a question..
>>>>> Do Hbase Aggregate Functions such as rowcount, getMax, get Average
> use MapReduce to execute those functions?
>>>>> Thanks :D
>    
>

RE: Hbase MapReduce

Posted by Wei Tan <wt...@us.ibm.com>.

Actually coprocessor can be used to implement MR-like function, while not 
using Hadoop framework.



Best Regards,
Wei

Wei Tan 
Research Staff Member 
IBM T. J. Watson Research Center
Yorktown Heights, NY 10598
wtan@us.ibm.com; 914-784-6752



From:   Dalia Sobhy <da...@hotmail.com>
To:     "user@hbase.apache.org" <us...@hbase.apache.org>, 
Date:   11/24/2012 01:33 PM
Subject:        RE: Hbase MapReduce




It 's weird that hbase aggregate functions don't use MapReduce, this means 
that the performance will be very poor.
Is it a must to use coprocessors?
Is there a much easier way to improve the functions' performance ?

> CC: user@hbase.apache.org
> From: michael_segel@hotmail.com
> Subject: Re: Hbase MapReduce
> Date: Sat, 24 Nov 2012 12:05:45 -0600
> To: user@hbase.apache.org
> 
> Do you think it would be a good idea to temper the use of CoProcessors?
> 
> This kind of reminds me of when people first started using stored 
procedures...
> 
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Nov 24, 2012, at 11:46 AM, tom <tw...@arcor.de> wrote:
> 
> > Hi, but you do not need to us M/R. You could also use coprocessors.
> > 
> > See this site:
> > https://blogs.apache.org/hbase/entry/coprocessor_introduction
> > -> in the section "Endpoints"
> > 
> > An aggregation coprocessor ships with hbase that should match your 
requirements.
> > You just need to load it and eventually you can access it from HTable:
> > 
> > HTable.coprocessorExec(..) <
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte
[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29>
> > 
> > Regards
> > tom
> > 
> > Am 24.11.2012 18:32, schrieb Marcos Ortiz:
> >> Regards, Dalia.
> >> You have to use MapReduce for that.
> >> In the HBase in Practice´s book, there are lot of great examples for 
this.
> >> 
> >> On 11/24/2012 12:15 PM, Dalia Sobhy wrote:
> >>> Dear all,
> >>> I wanted to ask a question..
> >>> Do Hbase Aggregate Functions such as rowcount, getMax, get Average 
use MapReduce to execute those functions?
> >>> Thanks :D
> >

RE: Hbase MapReduce

Posted by Dalia Sobhy <da...@hotmail.com>.

It 's weird that hbase aggregate functions don't use MapReduce, this means that the performance will be very poor.
Is it a must to use coprocessors?
Is there a much easier way to improve the functions' performance ?

> CC: user@hbase.apache.org
> From: michael_segel@hotmail.com
> Subject: Re: Hbase MapReduce
> Date: Sat, 24 Nov 2012 12:05:45 -0600
> To: user@hbase.apache.org
> 
> Do you think it would be a good idea to temper the use of CoProcessors?
> 
> This kind of reminds me of when people first started using stored procedures...
> 
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Nov 24, 2012, at 11:46 AM, tom <tw...@arcor.de> wrote:
> 
> > Hi, but you do not need to us M/R. You could also use coprocessors.
> > 
> > See this site:
> > https://blogs.apache.org/hbase/entry/coprocessor_introduction
> > -> in the section "Endpoints"
> > 
> > An aggregation coprocessor ships with hbase that should match your requirements.
> > You just need to load it and eventually you can access it from HTable:
> > 
> > HTable.coprocessorExec(..) <http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29>
> > 
> > Regards
> > tom
> > 
> > Am 24.11.2012 18:32, schrieb Marcos Ortiz:
> >> Regards, Dalia.
> >> You have to use MapReduce for that.
> >> In the HBase in Practice´s book, there are lot of great examples for this.
> >> 
> >> On 11/24/2012 12:15 PM, Dalia Sobhy wrote:
> >>> Dear all,
> >>> I wanted to ask a question..
> >>> Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions?
> >>> Thanks :D
> >

Re: Hbase MapReduce

Posted by Michel Segel <mi...@hotmail.com>.

Do you think it would be a good idea to temper the use of CoProcessors?

This kind of reminds me of when people first started using stored procedures...


Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 24, 2012, at 11:46 AM, tom <tw...@arcor.de> wrote:

> Hi, but you do not need to us M/R. You could also use coprocessors.
> 
> See this site:
> https://blogs.apache.org/hbase/entry/coprocessor_introduction
> -> in the section "Endpoints"
> 
> An aggregation coprocessor ships with hbase that should match your requirements.
> You just need to load it and eventually you can access it from HTable:
> 
> HTable.coprocessorExec(..) <http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29>
> 
> Regards
> tom
> 
> Am 24.11.2012 18:32, schrieb Marcos Ortiz:
>> Regards, Dalia.
>> You have to use MapReduce for that.
>> In the HBase in Practice´s book, there are lot of great examples for this.
>> 
>> On 11/24/2012 12:15 PM, Dalia Sobhy wrote:
>>> Dear all,
>>> I wanted to ask a question..
>>> Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions?
>>> Thanks :D
>

Re: Hbase MapReduce

Posted by tom <tw...@arcor.de>.

Hi, but you do not need to us M/R. You could also use coprocessors.

See this site:
https://blogs.apache.org/hbase/entry/coprocessor_introduction
-> in the section "Endpoints"

An aggregation coprocessor ships with hbase that should match your 
requirements.
You just need to load it and eventually you can access it from HTable:

HTable.coprocessorExec(..) 
<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29>

Regards
tom

Am 24.11.2012 18:32, schrieb Marcos Ortiz:
> Regards, Dalia.
> You have to use MapReduce for that.
> In the HBase in Practice´s book, there are lot of great examples for 
> this.
>
> On 11/24/2012 12:15 PM, Dalia Sobhy wrote:
>> Dear all,
>> I wanted to ask a question..
>> Do Hbase Aggregate Functions such as rowcount, getMax, get Average 
>> use MapReduce to execute those functions?
>> Thanks :D
>>
>
>

Re: Hbase MapReduce

Posted by Marcos Ortiz <ml...@uci.cu>.

Regards, Dalia.
You have to use MapReduce for that.
In the HBase in Practice´s book, there are lot of great examples for this.

On 11/24/2012 12:15 PM, Dalia Sobhy wrote:
> Dear all,
> I wanted to ask a question..
> Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions?
> Thanks :D 		 	   		
>

-- 

Marcos Luis Orti'z Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci