You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vijay <vi...@gmail.com> on 2009/06/04 21:47:16 UTC

Question regarding MR for Hbase

Hello Everyone,
I wanted to write a mr for hbase table.... there is a million record and i
wanted to write a Map reduce task to scan through it and then get the data
and find the 90the percentail of the result. It would be helpful if some one
has a tried it before...

The thing which i am finding hard to understand and the question is:
How does MR get the input to the MR job, i mean how do i get the scanner
output to the MR job? i dont want to get a million row first and give it
back to the MR job but i needed to the region server's data directly go to
the MR job.

I saw the example @
http://www.nabble.com/Re:-Map-Reduce-over-HBase---sample-code-p18253120.html
but
still couldn't get it... :(.

Thanks in advance!

Regards,
</VJ>

Re: Question regarding MR for Hbase

Posted by Vijay <vi...@gmail.com>.
Hmmm.... I think it is only me .... will try it at home today evening or
so.... but basically doesnt work... :)
Thanks and Regards,
</VJ>




On Fri, Jun 5, 2009 at 1:58 PM, Erik Holstad <er...@gmail.com> wrote:

> *I think I have:
> irc.freenode.org
>
> not really sure if it makes a difference
>
> Erik
> *
>

Re: Question regarding MR for Hbase

Posted by Erik Holstad <er...@gmail.com>.
*I think I have:
irc.freenode.org

not really sure if it makes a difference

Erik
*

Re: Question regarding MR for Hbase

Posted by stack <st...@duboce.net>.
Its working for us... you going to right place?

Channel is '#hbase' and server is 'irc.freenode.net'

St.Ack

On Fri, Jun 5, 2009 at 9:55 AM, Vijay <vi...@gmail.com> wrote:

> *Thanks Billy and Eric*, i got a working thing now.... The Export Util
> actually helped (should we add to the wiki?) now trying to get the response
> back to the servlet....
> *BTW*: is Hbase IRC working or it is only me?
>
> Regards,
> </VJ>
>
>
>
>
> On Thu, Jun 4, 2009 at 11:08 PM, Billy Pearson
> <sa...@pearsonwholesale.com>wrote:
>
> > Take a look also at
> > TableMapReduceUtil
> >
> > Its in the api docs for 0.19 and 0.20
> >
> > Billy
> >
> > "Vijay" <vi...@gmail.com> wrote in message
> > news:9b40bc2a0906041247j7d25f5a4y61351200ae2bb61f@mail.gmail.com...
> >
> >  Hello Everyone,
> >> I wanted to write a mr for hbase table.... there is a million record and
> i
> >> wanted to write a Map reduce task to scan through it and then get the
> data
> >> and find the 90the percentail of the result. It would be helpful if some
> >> one
> >> has a tried it before...
> >>
> >> The thing which i am finding hard to understand and the question is:
> >> How does MR get the input to the MR job, i mean how do i get the scanner
> >> output to the MR job? i dont want to get a million row first and give it
> >> back to the MR job but i needed to the region server's data directly go
> to
> >> the MR job.
> >>
> >> I saw the example @
> >>
> >>
> http://www.nabble.com/Re:-Map-Reduce-over-HBase---sample-code-p18253120.html
> >> but
> >> still couldn't get it... :(.
> >>
> >> Thanks in advance!
> >>
> >> Regards,
> >> </VJ>
> >>
> >>
> >
> >
>

Re: Question regarding MR for Hbase

Posted by Vijay <vi...@gmail.com>.
*Thanks Billy and Eric*, i got a working thing now.... The Export Util
actually helped (should we add to the wiki?) now trying to get the response
back to the servlet....
*BTW*: is Hbase IRC working or it is only me?

Regards,
</VJ>




On Thu, Jun 4, 2009 at 11:08 PM, Billy Pearson
<sa...@pearsonwholesale.com>wrote:

> Take a look also at
> TableMapReduceUtil
>
> Its in the api docs for 0.19 and 0.20
>
> Billy
>
> "Vijay" <vi...@gmail.com> wrote in message
> news:9b40bc2a0906041247j7d25f5a4y61351200ae2bb61f@mail.gmail.com...
>
>  Hello Everyone,
>> I wanted to write a mr for hbase table.... there is a million record and i
>> wanted to write a Map reduce task to scan through it and then get the data
>> and find the 90the percentail of the result. It would be helpful if some
>> one
>> has a tried it before...
>>
>> The thing which i am finding hard to understand and the question is:
>> How does MR get the input to the MR job, i mean how do i get the scanner
>> output to the MR job? i dont want to get a million row first and give it
>> back to the MR job but i needed to the region server's data directly go to
>> the MR job.
>>
>> I saw the example @
>>
>> http://www.nabble.com/Re:-Map-Reduce-over-HBase---sample-code-p18253120.html
>> but
>> still couldn't get it... :(.
>>
>> Thanks in advance!
>>
>> Regards,
>> </VJ>
>>
>>
>
>

Re: Question regarding MR for Hbase

Posted by Billy Pearson <sa...@pearsonwholesale.com>.
Take a look also at
TableMapReduceUtil

Its in the api docs for 0.19 and 0.20

Billy

"Vijay" <vi...@gmail.com> wrote in message 
news:9b40bc2a0906041247j7d25f5a4y61351200ae2bb61f@mail.gmail.com...
> Hello Everyone,
> I wanted to write a mr for hbase table.... there is a million record and i
> wanted to write a Map reduce task to scan through it and then get the data
> and find the 90the percentail of the result. It would be helpful if some 
> one
> has a tried it before...
>
> The thing which i am finding hard to understand and the question is:
> How does MR get the input to the MR job, i mean how do i get the scanner
> output to the MR job? i dont want to get a million row first and give it
> back to the MR job but i needed to the region server's data directly go to
> the MR job.
>
> I saw the example @
> http://www.nabble.com/Re:-Map-Reduce-over-HBase---sample-code-p18253120.html
> but
> still couldn't get it... :(.
>
> Thanks in advance!
>
> Regards,
> </VJ>
> 



Re: Question regarding MR for Hbase

Posted by Erik Holstad <er...@gmail.com>.
Hey Vijay!

Have a look at:
*https://issues.apache.org/jira/browse/HBASE-974*

Not the best written code, was among the first MR jobs we wrote, but it gets
the job done.
You don't have to do do the split yourself unless you want to, it is done
for you.

So basically you should just be able to run the code in the issue, though I
haven't tried it
in a couple of weeks, so if you have any trouble getting it going feel free
to write another email
or join us on the IRC.

Erik

Re: Question regarding MR for Hbase

Posted by Vijay <vi...@gmail.com>.
Thanks Eric,
After a while of reading i got the concept (
http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html).... but
the problem is implementation....
I tried using the following....

extended TableInputFormatBase and then added the filter and the
baseclass....

Then....

        InputSplit[] ts = tif.getSplits(job, numSplits);

now i got the input Splits which will be provided to the mapper. whats the
input type which i have to set?

        jobConf.setInputFormat(????);


Sorry for asking this question again, googled it a lot but didnt help.

Regards,
</VJ>




On Thu, Jun 4, 2009 at 4:23 PM, Erik Holstad <er...@gmail.com> wrote:

> but the concept will be the same
>

Re: Question regarding MR for Hbase

Posted by Erik Holstad <er...@gmail.com>.
Hey Vijay!
You can have a look at
http://wiki.apache.org/hadoop/Hbase/MapReduce
That might make things easier to understand, just remember that the new API
for 0.20 will look different,
but the concept will be the same

Erik