You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Benson Margulies <bi...@gmail.com> on 2012/02/10 23:10:21 UTC

grouping

It seems to me that in 1.3.5 there isn't a mechanism parallel to SQL
'group by'. That is, if you imagine some data like the following
(assume distinct rowids):

a b 1
a b 3
a c 2
a c 5
a d 3
a d 7

I want to end up with

a b 4
a c 7
a d 10

Do the 1.4 combiners cover this ground? Is there something in 1.3.5
that I'm missing?

Re: grouping

Posted by Benson Margulies <bi...@gmail.com>.
On Fri, Feb 10, 2012 at 7:43 PM, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> When you add an aggregator or combiner to a scan, it doesn't aggregate all the values in the scan range. It provides an aggregated value for each unique cell (i.e. row, column family, column qualifier, and column visibility tuple). It aggregates together values for keys that only differ by timestamp.
>
> If there is a VersioningIterator configured for the table (which there is by default), make sure to set the aggregator or combiner at a lower "priority" than the versioning, so that it occurs first -- or just remove the VersioningIterator from the table.

OK, I see, I have a different problem. I want to be able to control
the definition of a unique cell, and only aggregate values with the
same rowid and CF, not the CQ. So I guess I'll be doing my own
aggregation for the foreseeable unless I restructure the data.

>
> Billie

Re: grouping

Posted by Billie J Rinaldi <bi...@ugov.gov>.
When you add an aggregator or combiner to a scan, it doesn't aggregate all the values in the scan range. It provides an aggregated value for each unique cell (i.e. row, column family, column qualifier, and column visibility tuple). It aggregates together values for keys that only differ by timestamp.

If there is a VersioningIterator configured for the table (which there is by default), make sure to set the aggregator or combiner at a lower "priority" than the versioning, so that it occurs first -- or just remove the VersioningIterator from the table.

Billie

Re: grouping

Posted by Benson Margulies <bi...@gmail.com>.
Billie,

I don't think that this quite does it, or, at least, I don't see how
to use it. The aggregator gets pointed at all the values visited by
the scan, and isn't passed the CF/CQ for the current value. So I'd get
a sum of all the values, not the partial sums of the matching groups.
Unless there's some other grouping behavior that I'm missing.

--benson

On Fri, Feb 10, 2012 at 6:51 PM, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> In 1.3 you can use Aggregators (e.g. StringSummation) assuming that what you're aggregating is in the Value. Aggregators still exist in 1.4, but are being replaced by Combiners.
>
> Billie

Re: grouping

Posted by Billie J Rinaldi <bi...@ugov.gov>.
In 1.3 you can use Aggregators (e.g. StringSummation) assuming that what you're aggregating is in the Value. Aggregators still exist in 1.4, but are being replaced by Combiners.

Billie