You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Navdeep Agrawal <Na...@symantec.com> on 2015/05/14 14:37:37 UTC

cell level coprocessor

Hi,
I am trying to use co processor to do some aggregations(eg topn) over all versions of a cell and return it . I  found most of the aggregation implementation with coprocessors are done on column . how we can achieve for every cell in that column  ,any ideas ,links ???

Use case - if I want to dom some aggregation over all versions of cell and return single value for that cell given row key and column .


Thanks,
Navdeep

Re: cell level coprocessor

Posted by navdeep <Na...@symantec.com>.

hey thank you very much Alex for pointing me in right direction . 



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/cell-level-coprocessor-tp4071463p4071520.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: cell level coprocessor

Posted by Alex Baranau <al...@gmail.com>.

Hi Navdeep,

I believe you will need to:
* implement RegionScanner that would apply aggregation on Cell level
* extend from BaseRegionObserver to force using your RegionScanner in
preGet and preScan

I don't have a simple example in front of me, but maybe the following will
give you some pointers. We use versions of a Cell to store delta-values
when performing append-style increments (you put delta in next version of a
cell instead of incrementing existing). Then, during scanning those deltas
got summed up into a single value. I assume you want to do something along
those lines, so you may learn some from that code.

Here's the RegionScanner implementation [1]. Note that in next(List<Cell>
cells, int limit) you'll need to check for crossing the boundary of the
cell (i.e. cells given to you may have e.g. 3 versions of cell of column1
and 2 versions of a cell of column2).

Here's the BaseRegionObserver implementation [2].

On a side note, be sure to not overuse the versions of a Cell. Many times
using columns is a better schema design.

Cheers,
Alex Baranau
--
http://cdap.io - open source framework to build and run data applications
on Hadoop & HBase

[1]
https://github.com/caskdata/cdap/blob/develop/cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementSummingScanner.java

[2]
https://github.com/caskdata/cdap/blob/develop/cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementHandler.java

[3] http://hbase.apache.org/book.html#schema.versions

On Thu, May 14, 2015 at 5:37 AM, Navdeep Agrawal <
Navdeep_Agrawal@symantec.com> wrote:

> Hi,
> I am trying to use co processor to do some aggregations(eg topn) over all
> versions of a cell and return it . I  found most of the aggregation
> implementation with coprocessors are done on column . how we can achieve
> for every cell in that column  ,any ideas ,links ???
>
> Use case - if I want to dom some aggregation over all versions of cell and
> return single value for that cell given row key and column .
>
>
> Thanks,
> Navdeep
>