You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by y_...@tsmc.com on 2010/01/29 02:02:39 UTC
Hbase as Map/Reduce source
Hi,
I want to understand clearly about Hbase as Map/Reduce source.
Basicly, if a table with 100 regions, it means 100 map will be started,
right?
What's the difference between hdfs and hbase as a Map/Reduce source?
Thanks
Fleming Chiu(邱宏明)
707-6128
y_823910@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)
---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the intended recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------
Re: Hbase as Map/Reduce source
Posted by Kay Kay <ka...@gmail.com>.
HDFS is a double-edged sword . Being a raw file system - you can feed it
to a Map Reduce program although it might be necessary to define
InputSplit-s as appropriate to chop down the input size.
OTOH, HBase is structured data ( well - sort of ! ) using a file format
on top of HDFS to store the schema and hence comes with predefined
InputSplit-s that make it easy to get started on a MapReduce program.
From an API simplicity point of view - HBase can get you started
relatively faster because of it ( assuming you have your data in hbase).
Refer to -
http://wiki.apache.org/hadoop/Hbase/MapReduce .
Although the wiki says deprecated - in reality - it is suggested to
stick with *.mapred.* packages for some time since the underlying
.mapreduce.* packages are not mature enough at this point.
The decision is to entirely do with - the kind of the data you have and
identifying the data by a primary key amenable to your application,
which is all hbase in its rudimentary form needs.
On the other hand - if having a schema and defining a primary key for
your data seems non-orthogonal for your app - you can stick with HDFS
and a custom InputSplit depending on your data. Especially since HBase
provides a lot more than HDFS in terms of scanning / row id ordering and
if these features are not necessary for what you do - then storing data
in HDFS should be just about ok.
On 1/28/10 6:20 PM, Otis Gospodnetic wrote:
> I asked a similar question recently:
> http://search-hadoop.com/m?id=843956.53875.qm@web50305.mail.re2.yahoo.com||hbase%20mapreduce%20otis%20TableInputFormat
>
>
> Otis
>
>
>
> ----- Original Message ----
>
>> From: "y_823910@tsmc.com"<y_...@tsmc.com>
>> To: hbase-user@hadoop.apache.org
>> Sent: Thu, January 28, 2010 8:02:39 PM
>> Subject: Hbase as Map/Reduce source
>>
>> Hi,
>>
>> I want to understand clearly about Hbase as Map/Reduce source.
>> Basicly, if a table with 100 regions, it means 100 map will be started,
>> right?
>> What's the difference between hdfs and hbase as a Map/Reduce source?
>> Thanks
>>
>>
>>
>>
>> Fleming Chiu(邱宏明)
>> 707-6128
>> y_823910@tsmc.com
>> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
>>
>>
>> ---------------------------------------------------------------------------
>> TSMC PROPERTY
>> This email communication (and any attachments) is proprietary information
>> for the sole use of its
>> intended recipient. Any unauthorized review, use or distribution by anyone
>> other than the intended
>> recipient is strictly prohibited. If you are not the intended recipient,
>> please notify the sender by
>> replying to this email, and then delete this email and any copies of it
>> immediately. Thank you.
>> ---------------------------------------------------------------------------
>>
>
Re: Hbase as Map/Reduce source
Posted by Otis Gospodnetic <ot...@yahoo.com>.
I asked a similar question recently:
http://search-hadoop.com/m?id=843956.53875.qm@web50305.mail.re2.yahoo.com||hbase%20mapreduce%20otis%20TableInputFormat
Otis
----- Original Message ----
> From: "y_823910@tsmc.com" <y_...@tsmc.com>
> To: hbase-user@hadoop.apache.org
> Sent: Thu, January 28, 2010 8:02:39 PM
> Subject: Hbase as Map/Reduce source
>
> Hi,
>
> I want to understand clearly about Hbase as Map/Reduce source.
> Basicly, if a table with 100 regions, it means 100 map will be started,
> right?
> What's the difference between hdfs and hbase as a Map/Reduce source?
> Thanks
>
>
>
>
> Fleming Chiu(邱宏明)
> 707-6128
> y_823910@tsmc.com
> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
>
>
> ---------------------------------------------------------------------------
> TSMC PROPERTY
> This email communication (and any attachments) is proprietary information
> for the sole use of its
> intended recipient. Any unauthorized review, use or distribution by anyone
> other than the intended
> recipient is strictly prohibited. If you are not the intended recipient,
> please notify the sender by
> replying to this email, and then delete this email and any copies of it
> immediately. Thank you.
> ---------------------------------------------------------------------------
Re: Hbase as Map/Reduce source
Posted by ChingShen <ch...@gmail.com>.
Hi,
Please see the link below:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html#getSplits%28org.apache.hadoop.mapreduce.JobContext%29
Shen
2010/1/29 <y_...@tsmc.com>
> Hi,
>
> I want to understand clearly about Hbase as Map/Reduce source.
> Basicly, if a table with 100 regions, it means 100 map will be started,
> right?
> What's the difference between hdfs and hbase as a Map/Reduce source?
> Thanks
>
>
>
>
> Fleming Chiu(邱宏明)
> 707-6128
> y_823910@tsmc.com
> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
>
>
>
> ---------------------------------------------------------------------------
> TSMC PROPERTY
> This email communication (and any attachments) is proprietary information
> for the sole use of its
> intended recipient. Any unauthorized review, use or distribution by anyone
> other than the intended
> recipient is strictly prohibited. If you are not the intended recipient,
> please notify the sender by
> replying to this email, and then delete this email and any copies of it
> immediately. Thank you.
>
> ---------------------------------------------------------------------------
>
>
>
>