You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Navdeep Agrawal <Na...@symantec.com> on 2015/05/14 14:37:37 UTC
cell level coprocessor
Hi,
I am trying to use co processor to do some aggregations(eg topn) over all versions of a cell and return it . I found most of the aggregation implementation with coprocessors are done on column . how we can achieve for every cell in that column ,any ideas ,links ???
Use case - if I want to dom some aggregation over all versions of cell and return single value for that cell given row key and column .
Thanks,
Navdeep
Re: cell level coprocessor
Posted by navdeep <Na...@symantec.com>.
hey thank you very much Alex for pointing me in right direction .
--
View this message in context: http://apache-hbase.679495.n3.nabble.com/cell-level-coprocessor-tp4071463p4071520.html
Sent from the HBase User mailing list archive at Nabble.com.
Re: cell level coprocessor
Posted by Alex Baranau <al...@gmail.com>.
Hi Navdeep,
I believe you will need to:
* implement RegionScanner that would apply aggregation on Cell level
* extend from BaseRegionObserver to force using your RegionScanner in
preGet and preScan
I don't have a simple example in front of me, but maybe the following will
give you some pointers. We use versions of a Cell to store delta-values
when performing append-style increments (you put delta in next version of a
cell instead of incrementing existing). Then, during scanning those deltas
got summed up into a single value. I assume you want to do something along
those lines, so you may learn some from that code.
Here's the RegionScanner implementation [1]. Note that in next(List<Cell>
cells, int limit) you'll need to check for crossing the boundary of the
cell (i.e. cells given to you may have e.g. 3 versions of cell of column1
and 2 versions of a cell of column2).
Here's the BaseRegionObserver implementation [2].
On a side note, be sure to not overuse the versions of a Cell. Many times
using columns is a better schema design.
Cheers,
Alex Baranau
--
http://cdap.io - open source framework to build and run data applications
on Hadoop & HBase
[1]
https://github.com/caskdata/cdap/blob/develop/cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementSummingScanner.java
[2]
https://github.com/caskdata/cdap/blob/develop/cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementHandler.java
[3] http://hbase.apache.org/book.html#schema.versions
On Thu, May 14, 2015 at 5:37 AM, Navdeep Agrawal <
Navdeep_Agrawal@symantec.com> wrote:
> Hi,
> I am trying to use co processor to do some aggregations(eg topn) over all
> versions of a cell and return it . I found most of the aggregation
> implementation with coprocessors are done on column . how we can achieve
> for every cell in that column ,any ideas ,links ???
>
> Use case - if I want to dom some aggregation over all versions of cell and
> return single value for that cell given row key and column .
>
>
> Thanks,
> Navdeep
>