You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gora.apache.org by al...@aim.com on 2013/02/08 23:25:55 UTC

accelarate a few steps in gora-hbase


Hello,

I use gora with hbase-0.92 and nutch-2.1. All map phases of nutch jobs, generate, fetch  and update become very slow by the increase of records in hbase table.
Most of the time it is simply waste of time and resources when a mapper iterates over rows to choose the correct one.
 
I have been told the following in hbase mailing list



1. In 0.94, there is optimization in StoreFileScanner.requestSeek() where a
real seek is only done when seekTimestamp > maxTimestampInFile.

2. use time-range scan 

I would like to know if these features can be implemented in gora-hbase? If they are possible, can someone point to me the class where I can make changes?
Also, I would like to know which class and method populates key values to map functions in nutch?

Thanks.
Alex.

 
 

Re: accelarate a few steps in gora-hbase

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Alex,

On Fri, Feb 8, 2013 at 2:25 PM, <al...@aim.com> wrote:

>     Hello,
>
> I use gora with hbase-0.92 and nutch-2.1. All map phases of nutch jobs,
> generate, fetch  and update become very slow by the increase of records
> in hbase table.
> Most of the time it is simply waste of time and resources when a mapper
> iterates over rows to choose the correct one.
>

This is an issue we've been aware of for some time and something which
would be really beneficial indeed! Can you please check the following and
put your input if and where you see appropriate. It would really help the
issue along.
https://issues.apache.org/jira/browse/GORA-141
https://issues.apache.org/jira/browse/GORA-117
https://issues.apache.org/jira/browse/GORA-119

In particular GORA-119 has something which you may be able to patch against
trunk.

One the other two points, can we rebase once you've sen the above?
Thank you
Lewis