You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by y_...@tsmc.com on 2010/01/29 02:02:39 UTC

Hbase as Map/Reduce source

Hi,

I want to understand clearly about Hbase as Map/Reduce source.
Basicly, if a table with 100 regions, it means 100 map will be started,
right?
What's the difference between hdfs and hbase as a Map/Reduce source?
Thanks




Fleming Chiu(邱宏明)
707-6128
y_823910@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)


 --------------------------------------------------------------------------- 
                                                         TSMC PROPERTY       
 This email communication (and any attachments) is proprietary information   
 for the sole use of its                                                     
 intended recipient. Any unauthorized review, use or distribution by anyone  
 other than the intended                                                     
 recipient is strictly prohibited.  If you are not the intended recipient,   
 please notify the sender by                                                 
 replying to this email, and then delete this email and any copies of it     
 immediately. Thank you.                                                     
 ---------------------------------------------------------------------------

Re: Hbase as Map/Reduce source

Posted by Kay Kay <ka...@gmail.com>.

HDFS is a double-edged sword . Being a raw file system - you can feed it 
to a Map Reduce program although it might be necessary to define 
InputSplit-s as appropriate to chop down the input size.

OTOH, HBase is structured data ( well - sort of ! ) using a file format 
on top of HDFS to store the schema and hence comes with predefined 
InputSplit-s that make it easy to get started on a MapReduce program.
 From an API simplicity point of view - HBase can get you started 
relatively faster because of it ( assuming you have your data in hbase).

Refer to -
http://wiki.apache.org/hadoop/Hbase/MapReduce .

Although the wiki says deprecated - in reality - it is suggested to 
stick with  *.mapred.* packages for some time since the underlying 
.mapreduce.* packages are not mature enough at this point.

The decision is to entirely do with - the kind of the data you have and 
identifying the data by a primary key amenable to your application, 
which is all hbase in its rudimentary form needs.

On the other hand - if having a schema and defining a primary key for 
your data seems non-orthogonal for your app - you can stick with HDFS 
and a custom InputSplit depending on your data.  Especially since HBase 
provides a lot more than HDFS in terms of scanning / row id ordering and 
if these features are not necessary for what you do - then storing data 
in HDFS should be just about ok.

On 1/28/10 6:20 PM, Otis Gospodnetic wrote:
> I asked a similar question recently:
> http://search-hadoop.com/m?id=843956.53875.qm@web50305.mail.re2.yahoo.com||hbase%20mapreduce%20otis%20TableInputFormat
>
>
> Otis
>
>
>
> ----- Original Message ----
>    
>> From: "y_823910@tsmc.com"<y_...@tsmc.com>
>> To: hbase-user@hadoop.apache.org
>> Sent: Thu, January 28, 2010 8:02:39 PM
>> Subject: Hbase as Map/Reduce source
>>
>> Hi,
>>
>> I want to understand clearly about Hbase as Map/Reduce source.
>> Basicly, if a table with 100 regions, it means 100 map will be started,
>> right?
>> What's the difference between hdfs and hbase as a Map/Reduce source?
>> Thanks
>>
>>
>>
>>
>> Fleming Chiu(邱宏明)
>> 707-6128
>> y_823910@tsmc.com
>> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
>>
>>
>> ---------------------------------------------------------------------------
>>                                                           TSMC PROPERTY
>> This email communication (and any attachments) is proprietary information
>> for the sole use of its
>> intended recipient. Any unauthorized review, use or distribution by anyone
>> other than the intended
>> recipient is strictly prohibited.  If you are not the intended recipient,
>> please notify the sender by
>> replying to this email, and then delete this email and any copies of it
>> immediately. Thank you.
>> ---------------------------------------------------------------------------
>>      
>

Re: Hbase as Map/Reduce source

Posted by Otis Gospodnetic <ot...@yahoo.com>.

I asked a similar question recently:
http://search-hadoop.com/m?id=843956.53875.qm@web50305.mail.re2.yahoo.com||hbase%20mapreduce%20otis%20TableInputFormat


Otis



----- Original Message ----
> From: "y_823910@tsmc.com" <y_...@tsmc.com>
> To: hbase-user@hadoop.apache.org
> Sent: Thu, January 28, 2010 8:02:39 PM
> Subject: Hbase as Map/Reduce source
> 
> Hi,
> 
> I want to understand clearly about Hbase as Map/Reduce source.
> Basicly, if a table with 100 regions, it means 100 map will be started,
> right?
> What's the difference between hdfs and hbase as a Map/Reduce source?
> Thanks
> 
> 
> 
> 
> Fleming Chiu(邱宏明)
> 707-6128
> y_823910@tsmc.com
> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
> 
> 
> --------------------------------------------------------------------------- 
>                                                          TSMC PROPERTY      
> This email communication (and any attachments) is proprietary information  
> for the sole use of its                                                    
> intended recipient. Any unauthorized review, use or distribution by anyone  
> other than the intended                                                    
> recipient is strictly prohibited.  If you are not the intended recipient,  
> please notify the sender by                                                
> replying to this email, and then delete this email and any copies of it    
> immediately. Thank you.                                                    
> ---------------------------------------------------------------------------

Re: Hbase as Map/Reduce source

Posted by ChingShen <ch...@gmail.com>.

Hi,

Please see the link below:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html#getSplits%28org.apache.hadoop.mapreduce.JobContext%29

Shen

2010/1/29 <y_...@tsmc.com>

> Hi,
>
> I want to understand clearly about Hbase as Map/Reduce source.
> Basicly, if a table with 100 regions, it means 100 map will be started,
> right?
> What's the difference between hdfs and hbase as a Map/Reduce source?
> Thanks
>
>
>
>
> Fleming Chiu(邱宏明)
> 707-6128
> y_823910@tsmc.com
> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
>
>
>
>  ---------------------------------------------------------------------------
>                                                         TSMC PROPERTY
>  This email communication (and any attachments) is proprietary information
>  for the sole use of its
>  intended recipient. Any unauthorized review, use or distribution by anyone
>  other than the intended
>  recipient is strictly prohibited.  If you are not the intended recipient,
>  please notify the sender by
>  replying to this email, and then delete this email and any copies of it
>  immediately. Thank you.
>
>  ---------------------------------------------------------------------------
>
>
>
>