You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "leiwangouc@gmail.com" <le...@gmail.com> on 2014/08/11 13:14:20 UTC

How to get specific rowkey from hbase

Hi, 

    I have an input which has  about  10M records，each recored is a rowkey in hbase.
    How can i get these data from HBase with MapReduce job?
    
Thanks,
Lei


leiwangouc@gmail.com

Re: Re: How to get specific rowkey from hbase

Posted by Esteban Gutierrez <es...@cloudera.com>.

You can do that via
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get(java.util.List).
You will basically point the HTable of  via setTable in TableInputFormat to
the table with the new users for the time range you are looking and use the
result of to build the list that will be fed into HTableList<Get> but
instead of reading any data from the input split, you will be fetching data
via this list of new users. The same should be necessary for updating the
rows via HTable.put(List<Put>)

regards,
esteban.




--
Cloudera, Inc.



On Mon, Aug 11, 2014 at 6:10 AM, leiwangouc@gmail.com <le...@gmail.com>
wrote:

>
> Actually i mean how to do randomly get in MapReduce, not scan.
>
> Let me give a detailed description of my requirement:
> There's a Hbase table contais all the users(about 2G) we collected, and
> the rowkey is the user id.
> Every hour there comes some user info(5M~10M)
> For every coming user, get(HBase Get) the info from HBase, do a merge with
> the current hour info and put to HBase again. (If the user not exists in
> HBase, just consider this hour info)
>
> Now the getting step is done on one machine, i want to do it distributly
> with MapReduce.
>
>
>
> leiwangouc@gmail.com
>
> From: Shahab Yunus
> Date: 2014-08-11 20:10
> To: user@hbase.apache.org
> Subject: Re: How to get specific rowkey from hbase
> You can use the util classes provided already. Note that it won't be very
> fast and you might want to try out bulk import as well (especially if it is
> one time or rare occurrence.) It depends on your use case. Check out the
> documentation below:
>
> For the Map Reduce Hbase util:
> http://hbase.apache.org/book/mapreduce.example.html
>
> http://bigdataprocessing.wordpress.com/2012/07/27/hadoop-hbase-mapreduce-examples/
>
> For Hbase Bulk import:
>
> http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
>
> Regards,
> Shahab
>
>
> On Mon, Aug 11, 2014 at 7:14 AM, leiwangouc@gmail.com <
> leiwangouc@gmail.com>
> wrote:
>
> >
> > Hi,
> >
> >     I have an input which has  about  10M records，each recored is a
> rowkey
> > in hbase.
> >     How can i get these data from HBase with MapReduce job?
> >
> > Thanks,
> > Lei
> >
> >
> > leiwangouc@gmail.com
> >
>

Re: Re: How to get specific rowkey from hbase

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.

Actually i mean how to do randomly get in MapReduce, not scan.

Let me give a detailed description of my requirement:
There's a Hbase table contais all the users(about 2G) we collected, and the rowkey is the user id.  
Every hour there comes some user info(5M~10M)
For every coming user, get(HBase Get) the info from HBase, do a merge with the current hour info and put to HBase again. (If the user not exists in HBase, just consider this hour info)

Now the getting step is done on one machine, i want to do it distributly with MapReduce.

leiwangouc@gmail.com

From: Shahab Yunus
Date: 2014-08-11 20:10
To: user@hbase.apache.org
Subject: Re: How to get specific rowkey from hbase
You can use the util classes provided already. Note that it won't be very
fast and you might want to try out bulk import as well (especially if it is
one time or rare occurrence.) It depends on your use case. Check out the
documentation below:

For the Map Reduce Hbase util:
http://hbase.apache.org/book/mapreduce.example.html
http://bigdataprocessing.wordpress.com/2012/07/27/hadoop-hbase-mapreduce-examples/

For Hbase Bulk import:
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/

Regards,
Shahab

On Mon, Aug 11, 2014 at 7:14 AM, leiwangouc@gmail.com <le...@gmail.com>
wrote:

>
> Hi,
>
>     I have an input which has  about  10M records，each recored is a rowkey
> in hbase.
>     How can i get these data from HBase with MapReduce job?
>
> Thanks,
> Lei
>
>
> leiwangouc@gmail.com
>

Re: How to get specific rowkey from hbase

Posted by Shahab Yunus <sh...@gmail.com>.

You can use the util classes provided already. Note that it won't be very
fast and you might want to try out bulk import as well (especially if it is
one time or rare occurrence.) It depends on your use case. Check out the
documentation below:

For the Map Reduce Hbase util:
http://hbase.apache.org/book/mapreduce.example.html
http://bigdataprocessing.wordpress.com/2012/07/27/hadoop-hbase-mapreduce-examples/

For Hbase Bulk import:
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/

Regards,
Shahab

On Mon, Aug 11, 2014 at 7:14 AM, leiwangouc@gmail.com <le...@gmail.com>
wrote:

>
> Hi,
>
>     I have an input which has  about  10M records，each recored is a rowkey
> in hbase.
>     How can i get these data from HBase with MapReduce job?
>
> Thanks,
> Lei
>
>
> leiwangouc@gmail.com
>