You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Nicolae Marasoiu <ni...@gmail.com> on 2015/08/07 18:42:07 UTC

groupby(prefix(rowkey)) with multiple custom aggregated columns

Hi,

I need to implement a limited sql like filter+group+order, and the group is
on a fixed-length prefix of the rowkey (fixed per query), and the results
are multiple metrics including some custom ones like statistical unique
counts.

I noticed that available tooling with coprocessors, like
ColumnAggregationProtocol, involve just one metric e.g. one sum(column). We
collect many, and of course it is more efficient to scan the data once.

Please advise,
Nicu

Re: groupby(prefix(rowkey)) with multiple custom aggregated columns

Posted by anil gupta <an...@gmail.com>.
Hi Nicu,

Have you taken a look at Phoenix. It supports group by :
https://phoenix.apache.org/language/index.html
It will also provide you much more sql like querying on HBase.

On Fri, Aug 7, 2015 at 2:19 PM, Ted Yu <yu...@gmail.com> wrote:

> Please take a look
> at
> hbase-client/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
> which shows several other aggregations.
>
> BTW group by functionality would involve some more work since rows for the
> same group may span multiple regions.
>
> Cheers
>
> On Fri, Aug 7, 2015 at 9:42 AM, Nicolae Marasoiu <
> nicolae.marasoiu@gmail.com
> > wrote:
>
> > Hi,
> >
> > I need to implement a limited sql like filter+group+order, and the group
> is
> > on a fixed-length prefix of the rowkey (fixed per query), and the results
> > are multiple metrics including some custom ones like statistical unique
> > counts.
> >
> > I noticed that available tooling with coprocessors, like
> > ColumnAggregationProtocol, involve just one metric e.g. one sum(column).
> We
> > collect many, and of course it is more efficient to scan the data once.
> >
> > Please advise,
> > Nicu
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: groupby(prefix(rowkey)) with multiple custom aggregated columns

Posted by Ted Yu <yu...@gmail.com>.
Please take a look
at hbase-client/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
which shows several other aggregations.

BTW group by functionality would involve some more work since rows for the
same group may span multiple regions.

Cheers

On Fri, Aug 7, 2015 at 9:42 AM, Nicolae Marasoiu <nicolae.marasoiu@gmail.com
> wrote:

> Hi,
>
> I need to implement a limited sql like filter+group+order, and the group is
> on a fixed-length prefix of the rowkey (fixed per query), and the results
> are multiple metrics including some custom ones like statistical unique
> counts.
>
> I noticed that available tooling with coprocessors, like
> ColumnAggregationProtocol, involve just one metric e.g. one sum(column). We
> collect many, and of course it is more efficient to scan the data once.
>
> Please advise,
> Nicu
>