You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by David Medinets <da...@gmail.com> on 2012/04/12 04:59:46 UTC

Using Accumulo To Calculate Seven Day Rolling Average

Thanks. Using this technique seems to work. I wrote a blog entry to document it:

Using Accumulo To Calculate Seven Day Rolling Average
http://affy.blogspot.com/2012/04/using-accumulo-to-calculate-seven-day.html

On Wed, Apr 11, 2012 at 2:20 PM, Adam Fuchs <ad...@ugov.gov> wrote:
> David,
>
> In case of continuing confusion, I think it's best if you ignore Bill's
> suggestion for now and heed Josh's advice. Bill's suggestion might be an
> optimization to look at later on, but your initial approach seems sound.
>
> Adam
>
>
>
> On Tue, Apr 10, 2012 at 10:52 PM, David Medinets <da...@gmail.com>
> wrote:
>>
>> I thought there were issues associated with doing mutations inside
>> iterators?
>>
>> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum <ws...@gmail.com>
>> wrote:
>> > I don't think you'd necessarily need a an aggregator for that, although
>> > it doesn't seem like that's what you're doing here in the first place.
>> > Wouldn't it be easier to set a summation iterator that also keeps a count of
>> > of observations to do some server side math and then combine it all on the
>> > client? That way you can have a time series and to get weekly averages you
>> > just change your scan range.
>> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
>> >
>> >> I'm still thinking about how to use accumulo to calculate weekly
>> >> moving averages. I thought that using the maxVersions settings might
>> >> work to maintain the last 7 values. Then a program could simply sum
>> >> the values of a given row. So this is what I did:
>> >>
>> >> bin/accumulo shell -u root -p password
>> >>> createtable rolling
>> >> rolling> config -t rolling -s
>> >> table.iterator.scan.vers.opt.maxVersions=7
>> >> rolling> insert row cf cq 1
>> >> rolling> insert row cf cq 2
>> >> rolling> insert row cf cq 3
>> >> rolling> insert row cf cq 4
>> >> rolling> insert row cf cq 5
>> >> rolling> insert row cf cq 6
>> >> rolling> insert row cf cq 7
>> >> rolling> insert row cf cq 8
>> >> rolling> scan
>> >> row cf:cq []    8
>> >> row cf:cq []    7
>> >> row cf:cq []    6
>> >> row cf:cq []    5
>> >> row cf:cq []    4
>> >> row cf:cq []    3
>> >> row cf:cq []    2
>> >>
>> >> This is exactly what I wanted to see. So I wrote a simple scanner
>> >> program to read the table. Then I did another scan:
>> >>
>> >> rolling> scan
>> >> row cf:cq []    8
>> >>
>> >> Where did the rest of the records go?
>> >
>
>

Re: Using Accumulo To Calculate Seven Day Rolling Average

Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Friday, May 18, 2012 9:20:58 PM, "David Medinets" <da...@gmail.com> wrote:
> I'm replying a little late but Combiners replace the original values.
> Therefore, I don't think they can be used to calculate the kind of
> rolling averages I am calculating. There are other kinds of moving
> averages that don't depend historical data but frankly I don't
> remember their names.

Combiners do replace the original values, but the result does not have to be written back to the Accumulo table.  If you configure a Combiner for the scan scope only (not the minc or majc scopes), every scan will see newly combined values based on the underlying data.  If you want to see combined values sometimes and the underlying data sometimes, you can instead add a Combiner to a particular scanner with the addScanIterator method (also see setscaniter in the shell).

So, iterators configured for the scan scope do not always need to be configured for minc (flushing to disk) and majc (merging files) scopes.  We have not yet encountered applications where the opposite is true, which means that iterators configured for minc or majc scopes generally should be configured for all three scopes (minc, majc, and scan) so that a consistent view of the data is provided.

Billie


> On Thu, Apr 12, 2012 at 10:25 PM, Billie J Rinaldi
> <bi...@ugov.gov> wrote:
> > You could alternatively use a Combiner like the following to
> > calculate the average (though I haven't tested this bit of code).
> > You would configure this as a scan-time iterator (either a
> > persistent scan iterator for the table, or attached to a particular
> > Scanner) and would use the STRING encoding type of the LongCombiner.
> > Not that it would be necessarily better to use a Combiner to average
> > together 7 things, but I thought it would make a good example.
> >
> > public class AveragingCombiner extends LongCombiner {
> >  @Override
> >  public Long typedReduce(Key key, Iterator<Long> iter) {
> >    long sum = 0;
> >    long count = 0;
> >    while (iter.hasNext()) {
> >      sum = safeAdd(sum, iter.next());
> >      count++;
> >    }
> >    return sum/count;
> >  }
> > }
> >
> > Billie
> >
> >
> > ----- Original Message -----
> >> From: "David Medinets" <da...@gmail.com>
> >> To: user@accumulo.apache.org
> >> Sent: Wednesday, April 11, 2012 10:59:46 PM
> >> Subject: Using Accumulo To Calculate Seven Day Rolling Average
> >> Thanks. Using this technique seems to work. I wrote a blog entry to
> >> document it:
> >>
> >> Using Accumulo To Calculate Seven Day Rolling Average
> >> http://affy.blogspot.com/2012/04/using-accumulo-to-calculate-seven-day.html
> >>
> >> On Wed, Apr 11, 2012 at 2:20 PM, Adam Fuchs <ad...@ugov.gov>
> >> wrote:
> >> > David,
> >> >
> >> > In case of continuing confusion, I think it's best if you ignore
> >> > Bill's
> >> > suggestion for now and heed Josh's advice. Bill's suggestion
> >> > might
> >> > be an
> >> > optimization to look at later on, but your initial approach seems
> >> > sound.
> >> >
> >> > Adam
> >> >
> >> >
> >> >
> >> > On Tue, Apr 10, 2012 at 10:52 PM, David Medinets
> >> > <da...@gmail.com>
> >> > wrote:
> >> >>
> >> >> I thought there were issues associated with doing mutations
> >> >> inside
> >> >> iterators?
> >> >>
> >> >> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum
> >> >> <ws...@gmail.com>
> >> >> wrote:
> >> >> > I don't think you'd necessarily need a an aggregator for that,
> >> >> > although
> >> >> > it doesn't seem like that's what you're doing here in the
> >> >> > first
> >> >> > place.
> >> >> > Wouldn't it be easier to set a summation iterator that also
> >> >> > keeps
> >> >> > a count of
> >> >> > of observations to do some server side math and then combine
> >> >> > it
> >> >> > all on the
> >> >> > client? That way you can have a time series and to get weekly
> >> >> > averages you
> >> >> > just change your scan range.
> >> >> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
> >> >> >
> >> >> >> I'm still thinking about how to use accumulo to calculate
> >> >> >> weekly
> >> >> >> moving averages. I thought that using the maxVersions
> >> >> >> settings
> >> >> >> might
> >> >> >> work to maintain the last 7 values. Then a program could
> >> >> >> simply
> >> >> >> sum
> >> >> >> the values of a given row. So this is what I did:
> >> >> >>
> >> >> >> bin/accumulo shell -u root -p password
> >> >> >>> createtable rolling
> >> >> >> rolling> config -t rolling -s
> >> >> >> table.iterator.scan.vers.opt.maxVersions=7
> >> >> >> rolling> insert row cf cq 1
> >> >> >> rolling> insert row cf cq 2
> >> >> >> rolling> insert row cf cq 3
> >> >> >> rolling> insert row cf cq 4
> >> >> >> rolling> insert row cf cq 5
> >> >> >> rolling> insert row cf cq 6
> >> >> >> rolling> insert row cf cq 7
> >> >> >> rolling> insert row cf cq 8
> >> >> >> rolling> scan
> >> >> >> row cf:cq [] 8
> >> >> >> row cf:cq [] 7
> >> >> >> row cf:cq [] 6
> >> >> >> row cf:cq [] 5
> >> >> >> row cf:cq [] 4
> >> >> >> row cf:cq [] 3
> >> >> >> row cf:cq [] 2
> >> >> >>
> >> >> >> This is exactly what I wanted to see. So I wrote a simple
> >> >> >> scanner
> >> >> >> program to read the table. Then I did another scan:
> >> >> >>
> >> >> >> rolling> scan
> >> >> >> row cf:cq [] 8
> >> >> >>
> >> >> >> Where did the rest of the records go?
> >> >> >
> >> >
> >> >

Re: Using Accumulo To Calculate Seven Day Rolling Average

Posted by David Medinets <da...@gmail.com>.
I'm replying a little late but Combiners replace the original values.
Therefore, I don't think they can be used to calculate the kind of
rolling averages I am calculating. There are other kinds of moving
averages that don't depend historical data but frankly I don't
remember their names.

On Thu, Apr 12, 2012 at 10:25 PM, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> You could alternatively use a Combiner like the following to calculate the average (though I haven't tested this bit of code).  You would configure this as a scan-time iterator (either a persistent scan iterator for the table, or attached to a particular Scanner) and would use the STRING encoding type of the LongCombiner.  Not that it would be necessarily better to use a Combiner to average together 7 things, but I thought it would make a good example.
>
> public class AveragingCombiner extends LongCombiner {
>  @Override
>  public Long typedReduce(Key key, Iterator<Long> iter) {
>    long sum = 0;
>    long count = 0;
>    while (iter.hasNext()) {
>      sum = safeAdd(sum, iter.next());
>      count++;
>    }
>    return sum/count;
>  }
> }
>
> Billie
>
>
> ----- Original Message -----
>> From: "David Medinets" <da...@gmail.com>
>> To: user@accumulo.apache.org
>> Sent: Wednesday, April 11, 2012 10:59:46 PM
>> Subject: Using Accumulo To Calculate Seven Day Rolling Average
>> Thanks. Using this technique seems to work. I wrote a blog entry to
>> document it:
>>
>> Using Accumulo To Calculate Seven Day Rolling Average
>> http://affy.blogspot.com/2012/04/using-accumulo-to-calculate-seven-day.html
>>
>> On Wed, Apr 11, 2012 at 2:20 PM, Adam Fuchs <ad...@ugov.gov>
>> wrote:
>> > David,
>> >
>> > In case of continuing confusion, I think it's best if you ignore
>> > Bill's
>> > suggestion for now and heed Josh's advice. Bill's suggestion might
>> > be an
>> > optimization to look at later on, but your initial approach seems
>> > sound.
>> >
>> > Adam
>> >
>> >
>> >
>> > On Tue, Apr 10, 2012 at 10:52 PM, David Medinets
>> > <da...@gmail.com>
>> > wrote:
>> >>
>> >> I thought there were issues associated with doing mutations inside
>> >> iterators?
>> >>
>> >> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum
>> >> <ws...@gmail.com>
>> >> wrote:
>> >> > I don't think you'd necessarily need a an aggregator for that,
>> >> > although
>> >> > it doesn't seem like that's what you're doing here in the first
>> >> > place.
>> >> > Wouldn't it be easier to set a summation iterator that also keeps
>> >> > a count of
>> >> > of observations to do some server side math and then combine it
>> >> > all on the
>> >> > client? That way you can have a time series and to get weekly
>> >> > averages you
>> >> > just change your scan range.
>> >> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
>> >> >
>> >> >> I'm still thinking about how to use accumulo to calculate weekly
>> >> >> moving averages. I thought that using the maxVersions settings
>> >> >> might
>> >> >> work to maintain the last 7 values. Then a program could simply
>> >> >> sum
>> >> >> the values of a given row. So this is what I did:
>> >> >>
>> >> >> bin/accumulo shell -u root -p password
>> >> >>> createtable rolling
>> >> >> rolling> config -t rolling -s
>> >> >> table.iterator.scan.vers.opt.maxVersions=7
>> >> >> rolling> insert row cf cq 1
>> >> >> rolling> insert row cf cq 2
>> >> >> rolling> insert row cf cq 3
>> >> >> rolling> insert row cf cq 4
>> >> >> rolling> insert row cf cq 5
>> >> >> rolling> insert row cf cq 6
>> >> >> rolling> insert row cf cq 7
>> >> >> rolling> insert row cf cq 8
>> >> >> rolling> scan
>> >> >> row cf:cq [] 8
>> >> >> row cf:cq [] 7
>> >> >> row cf:cq [] 6
>> >> >> row cf:cq [] 5
>> >> >> row cf:cq [] 4
>> >> >> row cf:cq [] 3
>> >> >> row cf:cq [] 2
>> >> >>
>> >> >> This is exactly what I wanted to see. So I wrote a simple
>> >> >> scanner
>> >> >> program to read the table. Then I did another scan:
>> >> >>
>> >> >> rolling> scan
>> >> >> row cf:cq [] 8
>> >> >>
>> >> >> Where did the rest of the records go?
>> >> >
>> >
>> >

Re: Using Accumulo To Calculate Seven Day Rolling Average

Posted by Adam Fuchs <ad...@ugov.gov>.
You could use a combiner for values that match the same day, and then roll
off whole days. This could be used along with a scan-time combiner to do
averages across multiple days.

Alternatively, s/day/hour/g or s/day/minute/g.

Exponentially weighted moving averages might also be cool to do in a
combiner:
http://en.wikipedia.org/wiki/Exponential_decay

Cheers,
Adam


On Fri, May 18, 2012 at 9:21 PM, David Medinets <da...@gmail.com>wrote:

> I'm replying a little late but Combiners replace the original values.
> Therefore, I don't think they can be used to calculate the kind of
> rolling averages I am calculating. There are other kinds of moving
> averages that don't depend historical data but frankly I don't
> remember their names.
>
> On Thu, Apr 12, 2012 at 10:25 PM, Billie J Rinaldi
> <bi...@ugov.gov> wrote:
> > You could alternatively use a Combiner like the following to calculate
> the average (though I haven't tested this bit of code).  You would
> configure this as a scan-time iterator (either a persistent scan iterator
> for the table, or attached to a particular Scanner) and would use the
> STRING encoding type of the LongCombiner.  Not that it would be necessarily
> better to use a Combiner to average together 7 things, but I thought it
> would make a good example.
> >
> > public class AveragingCombiner extends LongCombiner {
> >  @Override
> >  public Long typedReduce(Key key, Iterator<Long> iter) {
> >    long sum = 0;
> >    long count = 0;
> >    while (iter.hasNext()) {
> >      sum = safeAdd(sum, iter.next());
> >      count++;
> >    }
> >    return sum/count;
> >  }
> > }
> >
> > Billie
> >
> >
> > ----- Original Message -----
> >> From: "David Medinets" <da...@gmail.com>
> >> To: user@accumulo.apache.org
> >> Sent: Wednesday, April 11, 2012 10:59:46 PM
> >> Subject: Using Accumulo To Calculate Seven Day Rolling Average
> >> Thanks. Using this technique seems to work. I wrote a blog entry to
> >> document it:
> >>
> >> Using Accumulo To Calculate Seven Day Rolling Average
> >>
> http://affy.blogspot.com/2012/04/using-accumulo-to-calculate-seven-day.html
> >>
> >> On Wed, Apr 11, 2012 at 2:20 PM, Adam Fuchs <ad...@ugov.gov>
> >> wrote:
> >> > David,
> >> >
> >> > In case of continuing confusion, I think it's best if you ignore
> >> > Bill's
> >> > suggestion for now and heed Josh's advice. Bill's suggestion might
> >> > be an
> >> > optimization to look at later on, but your initial approach seems
> >> > sound.
> >> >
> >> > Adam
> >> >
> >> >
> >> >
> >> > On Tue, Apr 10, 2012 at 10:52 PM, David Medinets
> >> > <da...@gmail.com>
> >> > wrote:
> >> >>
> >> >> I thought there were issues associated with doing mutations inside
> >> >> iterators?
> >> >>
> >> >> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum
> >> >> <ws...@gmail.com>
> >> >> wrote:
> >> >> > I don't think you'd necessarily need a an aggregator for that,
> >> >> > although
> >> >> > it doesn't seem like that's what you're doing here in the first
> >> >> > place.
> >> >> > Wouldn't it be easier to set a summation iterator that also keeps
> >> >> > a count of
> >> >> > of observations to do some server side math and then combine it
> >> >> > all on the
> >> >> > client? That way you can have a time series and to get weekly
> >> >> > averages you
> >> >> > just change your scan range.
> >> >> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
> >> >> >
> >> >> >> I'm still thinking about how to use accumulo to calculate weekly
> >> >> >> moving averages. I thought that using the maxVersions settings
> >> >> >> might
> >> >> >> work to maintain the last 7 values. Then a program could simply
> >> >> >> sum
> >> >> >> the values of a given row. So this is what I did:
> >> >> >>
> >> >> >> bin/accumulo shell -u root -p password
> >> >> >>> createtable rolling
> >> >> >> rolling> config -t rolling -s
> >> >> >> table.iterator.scan.vers.opt.maxVersions=7
> >> >> >> rolling> insert row cf cq 1
> >> >> >> rolling> insert row cf cq 2
> >> >> >> rolling> insert row cf cq 3
> >> >> >> rolling> insert row cf cq 4
> >> >> >> rolling> insert row cf cq 5
> >> >> >> rolling> insert row cf cq 6
> >> >> >> rolling> insert row cf cq 7
> >> >> >> rolling> insert row cf cq 8
> >> >> >> rolling> scan
> >> >> >> row cf:cq [] 8
> >> >> >> row cf:cq [] 7
> >> >> >> row cf:cq [] 6
> >> >> >> row cf:cq [] 5
> >> >> >> row cf:cq [] 4
> >> >> >> row cf:cq [] 3
> >> >> >> row cf:cq [] 2
> >> >> >>
> >> >> >> This is exactly what I wanted to see. So I wrote a simple
> >> >> >> scanner
> >> >> >> program to read the table. Then I did another scan:
> >> >> >>
> >> >> >> rolling> scan
> >> >> >> row cf:cq [] 8
> >> >> >>
> >> >> >> Where did the rest of the records go?
> >> >> >
> >> >
> >> >
>

Re: Using Accumulo To Calculate Seven Day Rolling Average

Posted by Billie J Rinaldi <bi...@ugov.gov>.
You could alternatively use a Combiner like the following to calculate the average (though I haven't tested this bit of code).  You would configure this as a scan-time iterator (either a persistent scan iterator for the table, or attached to a particular Scanner) and would use the STRING encoding type of the LongCombiner.  Not that it would be necessarily better to use a Combiner to average together 7 things, but I thought it would make a good example.

public class AveragingCombiner extends LongCombiner {
  @Override
  public Long typedReduce(Key key, Iterator<Long> iter) {
    long sum = 0;
    long count = 0;
    while (iter.hasNext()) {
      sum = safeAdd(sum, iter.next());
      count++;
    }
    return sum/count;
  }
}

Billie


----- Original Message -----
> From: "David Medinets" <da...@gmail.com>
> To: user@accumulo.apache.org
> Sent: Wednesday, April 11, 2012 10:59:46 PM
> Subject: Using Accumulo To Calculate Seven Day Rolling Average
> Thanks. Using this technique seems to work. I wrote a blog entry to
> document it:
> 
> Using Accumulo To Calculate Seven Day Rolling Average
> http://affy.blogspot.com/2012/04/using-accumulo-to-calculate-seven-day.html
> 
> On Wed, Apr 11, 2012 at 2:20 PM, Adam Fuchs <ad...@ugov.gov>
> wrote:
> > David,
> >
> > In case of continuing confusion, I think it's best if you ignore
> > Bill's
> > suggestion for now and heed Josh's advice. Bill's suggestion might
> > be an
> > optimization to look at later on, but your initial approach seems
> > sound.
> >
> > Adam
> >
> >
> >
> > On Tue, Apr 10, 2012 at 10:52 PM, David Medinets
> > <da...@gmail.com>
> > wrote:
> >>
> >> I thought there were issues associated with doing mutations inside
> >> iterators?
> >>
> >> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum
> >> <ws...@gmail.com>
> >> wrote:
> >> > I don't think you'd necessarily need a an aggregator for that,
> >> > although
> >> > it doesn't seem like that's what you're doing here in the first
> >> > place.
> >> > Wouldn't it be easier to set a summation iterator that also keeps
> >> > a count of
> >> > of observations to do some server side math and then combine it
> >> > all on the
> >> > client? That way you can have a time series and to get weekly
> >> > averages you
> >> > just change your scan range.
> >> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
> >> >
> >> >> I'm still thinking about how to use accumulo to calculate weekly
> >> >> moving averages. I thought that using the maxVersions settings
> >> >> might
> >> >> work to maintain the last 7 values. Then a program could simply
> >> >> sum
> >> >> the values of a given row. So this is what I did:
> >> >>
> >> >> bin/accumulo shell -u root -p password
> >> >>> createtable rolling
> >> >> rolling> config -t rolling -s
> >> >> table.iterator.scan.vers.opt.maxVersions=7
> >> >> rolling> insert row cf cq 1
> >> >> rolling> insert row cf cq 2
> >> >> rolling> insert row cf cq 3
> >> >> rolling> insert row cf cq 4
> >> >> rolling> insert row cf cq 5
> >> >> rolling> insert row cf cq 6
> >> >> rolling> insert row cf cq 7
> >> >> rolling> insert row cf cq 8
> >> >> rolling> scan
> >> >> row cf:cq [] 8
> >> >> row cf:cq [] 7
> >> >> row cf:cq [] 6
> >> >> row cf:cq [] 5
> >> >> row cf:cq [] 4
> >> >> row cf:cq [] 3
> >> >> row cf:cq [] 2
> >> >>
> >> >> This is exactly what I wanted to see. So I wrote a simple
> >> >> scanner
> >> >> program to read the table. Then I did another scan:
> >> >>
> >> >> rolling> scan
> >> >> row cf:cq [] 8
> >> >>
> >> >> Where did the rest of the records go?
> >> >
> >
> >