You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by David Medinets <da...@gmail.com> on 2012/04/11 04:16:31 UTC

Using maxVersions=7 but rows disappear

I'm still thinking about how to use accumulo to calculate weekly
moving averages. I thought that using the maxVersions settings might
work to maintain the last 7 values. Then a program could simply sum
the values of a given row. So this is what I did:

bin/accumulo shell -u root -p password
> createtable rolling
rolling> config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7
rolling> insert row cf cq 1
rolling> insert row cf cq 2
rolling> insert row cf cq 3
rolling> insert row cf cq 4
rolling> insert row cf cq 5
rolling> insert row cf cq 6
rolling> insert row cf cq 7
rolling> insert row cf cq 8
rolling> scan
row cf:cq []    8
row cf:cq []    7
row cf:cq []    6
row cf:cq []    5
row cf:cq []    4
row cf:cq []    3
row cf:cq []    2

This is exactly what I wanted to see. So I wrote a simple scanner
program to read the table. Then I did another scan:

rolling> scan
row cf:cq []    8

Where did the rest of the records go?

Re: Using maxVersions=7 but rows disappear

Posted by David Medinets <da...@gmail.com>.
I thought there were issues associated with doing mutations inside iterators?

On Tue, Apr 10, 2012 at 10:35 PM, William Slacum <ws...@gmail.com> wrote:
> I don't think you'd necessarily need a an aggregator for that, although it doesn't seem like that's what you're doing here in the first place. Wouldn't it be easier to set a summation iterator that also keeps a count of of observations to do some server side math and then combine it all on the client? That way you can have a time series and to get weekly averages you just change your scan range.
> On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
>
>> I'm still thinking about how to use accumulo to calculate weekly
>> moving averages. I thought that using the maxVersions settings might
>> work to maintain the last 7 values. Then a program could simply sum
>> the values of a given row. So this is what I did:
>>
>> bin/accumulo shell -u root -p password
>>> createtable rolling
>> rolling> config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7
>> rolling> insert row cf cq 1
>> rolling> insert row cf cq 2
>> rolling> insert row cf cq 3
>> rolling> insert row cf cq 4
>> rolling> insert row cf cq 5
>> rolling> insert row cf cq 6
>> rolling> insert row cf cq 7
>> rolling> insert row cf cq 8
>> rolling> scan
>> row cf:cq []    8
>> row cf:cq []    7
>> row cf:cq []    6
>> row cf:cq []    5
>> row cf:cq []    4
>> row cf:cq []    3
>> row cf:cq []    2
>>
>> This is exactly what I wanted to see. So I wrote a simple scanner
>> program to read the table. Then I did another scan:
>>
>> rolling> scan
>> row cf:cq []    8
>>
>> Where did the rest of the records go?
>

Re: Using maxVersions=7 but rows disappear

Posted by Adam Fuchs <ad...@ugov.gov>.
David,

In case of continuing confusion, I think it's best if you ignore Bill's
suggestion for now and heed Josh's advice. Bill's suggestion might be an
optimization to look at later on, but your initial approach seems sound.

Adam



On Tue, Apr 10, 2012 at 10:52 PM, David Medinets
<da...@gmail.com>wrote:

> I thought there were issues associated with doing mutations inside
> iterators?
>
> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum <ws...@gmail.com>
> wrote:
> > I don't think you'd necessarily need a an aggregator for that, although
> it doesn't seem like that's what you're doing here in the first place.
> Wouldn't it be easier to set a summation iterator that also keeps a count
> of of observations to do some server side math and then combine it all on
> the client? That way you can have a time series and to get weekly averages
> you just change your scan range.
> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
> >
> >> I'm still thinking about how to use accumulo to calculate weekly
> >> moving averages. I thought that using the maxVersions settings might
> >> work to maintain the last 7 values. Then a program could simply sum
> >> the values of a given row. So this is what I did:
> >>
> >> bin/accumulo shell -u root -p password
> >>> createtable rolling
> >> rolling> config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7
> >> rolling> insert row cf cq 1
> >> rolling> insert row cf cq 2
> >> rolling> insert row cf cq 3
> >> rolling> insert row cf cq 4
> >> rolling> insert row cf cq 5
> >> rolling> insert row cf cq 6
> >> rolling> insert row cf cq 7
> >> rolling> insert row cf cq 8
> >> rolling> scan
> >> row cf:cq []    8
> >> row cf:cq []    7
> >> row cf:cq []    6
> >> row cf:cq []    5
> >> row cf:cq []    4
> >> row cf:cq []    3
> >> row cf:cq []    2
> >>
> >> This is exactly what I wanted to see. So I wrote a simple scanner
> >> program to read the table. Then I did another scan:
> >>
> >> rolling> scan
> >> row cf:cq []    8
> >>
> >> Where did the rest of the records go?
> >
>

Re: Using maxVersions=7 but rows disappear

Posted by William Slacum <ws...@gmail.com>.
I don't think you'd necessarily need a an aggregator for that, although it doesn't seem like that's what you're doing here in the first place. Wouldn't it be easier to set a summation iterator that also keeps a count of of observations to do some server side math and then combine it all on the client? That way you can have a time series and to get weekly averages you just change your scan range.
On Apr 10, 2012, at 10:16 PM, David Medinets wrote:

> I'm still thinking about how to use accumulo to calculate weekly
> moving averages. I thought that using the maxVersions settings might
> work to maintain the last 7 values. Then a program could simply sum
> the values of a given row. So this is what I did:
> 
> bin/accumulo shell -u root -p password
>> createtable rolling
> rolling> config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7
> rolling> insert row cf cq 1
> rolling> insert row cf cq 2
> rolling> insert row cf cq 3
> rolling> insert row cf cq 4
> rolling> insert row cf cq 5
> rolling> insert row cf cq 6
> rolling> insert row cf cq 7
> rolling> insert row cf cq 8
> rolling> scan
> row cf:cq []    8
> row cf:cq []    7
> row cf:cq []    6
> row cf:cq []    5
> row cf:cq []    4
> row cf:cq []    3
> row cf:cq []    2
> 
> This is exactly what I wanted to see. So I wrote a simple scanner
> program to read the table. Then I did another scan:
> 
> rolling> scan
> row cf:cq []    8
> 
> Where did the rest of the records go?


Re: Using maxVersions=7 but rows disappear

Posted by Josh Elser <jo...@gmail.com>.
David,

I'd venture a guess that because you only set the scan maxVersions, when 
Accumulo minor compacted your 'rolling' table to flush those K/V pairs 
to disk, it deleted your first 6 versions that you saw when performing 
the scan.

You can determine if this is actually what happened by running your 
inserts below, and calling 'compact' on the table before performing the 
scan.

To fix this, try setting the same option with the minc and majc scope. 
Most likely (but don't quote me):

config -t rolling -s table.iterator.minc.vers.opt.maxVersions=7
config -t rolling -s table.iterator.majc.vers.opt.maxVersions=7

- Josh

On 4/10/2012 10:16 PM, David Medinets wrote:
> I'm still thinking about how to use accumulo to calculate weekly
> moving averages. I thought that using the maxVersions settings might
> work to maintain the last 7 values. Then a program could simply sum
> the values of a given row. So this is what I did:
>
> bin/accumulo shell -u root -p password
>> createtable rolling
> rolling>  config -t rolling -s table.iterator.scan.vers.opt.maxVersions=7
> rolling>  insert row cf cq 1
> rolling>  insert row cf cq 2
> rolling>  insert row cf cq 3
> rolling>  insert row cf cq 4
> rolling>  insert row cf cq 5
> rolling>  insert row cf cq 6
> rolling>  insert row cf cq 7
> rolling>  insert row cf cq 8
> rolling>  scan
> row cf:cq []    8
> row cf:cq []    7
> row cf:cq []    6
> row cf:cq []    5
> row cf:cq []    4
> row cf:cq []    3
> row cf:cq []    2
>
> This is exactly what I wanted to see. So I wrote a simple scanner
> program to read the table. Then I did another scan:
>
> rolling>  scan
> row cf:cq []    8
>
> Where did the rest of the records go?