You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Himanshu Vashishtha (JIRA)" <ji...@apache.org> on 2010/11/01 06:29:27 UTC

[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

    [ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926831#action_12926831 ] 

Himanshu Vashishtha commented on HBASE-1512:
--------------------------------------------

With the 2001 patch, the basic infrastructure required by these functions is available. I wrote a test class to cover some of these, but am confused about their degree of 'generic'-ness. 

Here, I assumed the user is aware of the table in context and the return types he is getting from the Coprocessor impls, and so the input/output types of these  agg operations will also be the same. Therefore he builds agg function classes with those 'types'. I think it is kind of skewed assumption and seeks further clarification. What are the expectations from the 'end interface'? 

I have attached the new/modified classes (2/1). 
a) ProcessResultsFromCP: to be implemented by the agg functions (can be part of the Batch class). 
b) TestAggFunctions: has the test case using the agg functions
c) HTable: one method to execute the aggregation functions.

There is high probability that I have twisted the desired feature entirely, so please feel free to 'lambaste' the code and its underlying assumptions.

PS: I was thinking to make this jira a sub item for jira 2469, but couldn't come up with some thing worth mentioning.


> Coprocessors: Support aggregate functions
> -----------------------------------------
>
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: stack
>         Attachments: 1512.zip
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys.  A bunch of time and resources have been wasted returning data that we're not interested in.  With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows).   We could have it so the count was just done per region and return that.  Or we could maybe make a small change in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.