You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by David Quigley <dq...@gmail.com> on 2014/04/09 04:30:30 UTC

Efficiently scan table segments with coprocessor

Hi,

We have a short and tall hbase table structure where a single user's data
is stored across a set of rows in hbase. New events are inserted into a
user's data, and the sorting of the rows handles the structure of the users
data. Most of our user base is inactive at any given time. When a user's
data is updated, we want to run a computation on the user's data and update
the user's stats. Because of our userbase pattern, running a full scan is
very inefficient. We basically want to stream updates every time a user
takes an action.

I am thinking that the best way to do this is through a RegionObserver
coprocessor, using either prePut or postPut. Only problem is that we would
need to instantiate a new scan each time pre or postPut is called, which
might we inefficient. Since hbase already has a pointer to the user's data
via the Put, is there any way to leverage this to scan that local data more
efficiently than using a new InternalScanner?

Thanks,
Dave

Re: Efficiently scan table segments with coprocessor

Posted by Ted Yu <yu...@gmail.com>.
bq.  a single user's data is stored across a set of rows

Have you looked at HBASE-9488 'Improve performance for small scan' ?

Cheers


On Tue, Apr 8, 2014 at 8:30 PM, David Quigley <dq...@gmail.com> wrote:

> Hi,
>
> We have a short and tall hbase table structure where a single user's data
> is stored across a set of rows in hbase. New events are inserted into a
> user's data, and the sorting of the rows handles the structure of the users
> data. Most of our user base is inactive at any given time. When a user's
> data is updated, we want to run a computation on the user's data and update
> the user's stats. Because of our userbase pattern, running a full scan is
> very inefficient. We basically want to stream updates every time a user
> takes an action.
>
> I am thinking that the best way to do this is through a RegionObserver
> coprocessor, using either prePut or postPut. Only problem is that we would
> need to instantiate a new scan each time pre or postPut is called, which
> might we inefficient. Since hbase already has a pointer to the user's data
> via the Put, is there any way to leverage this to scan that local data more
> efficiently than using a new InternalScanner?
>
> Thanks,
> Dave
>