You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Dickson, Matt MR" <ma...@defence.gov.au> on 2013/02/12 00:47:28 UTC

MinCombiner and MaxCombiner priority issue [SEC=UNCLASSIFIED]

UNCLASSIFIED

Hi,

I'm reasonably new to using Accumulo so I apologise if some of my terminology is incorrect.

A bit of overview

We have an Accumulo table that ingests data in daily increments and ages off data in daily increments.  For each unique rowid we maintain a daily max and min value and a count, using the MinCombiner, MaxCombiner and SummingCombiner.  When a user queries the table for a rowid, scan iterators are added to calculate the min, max and count across the entire table by adding up the daily summaries of min, max and count.

The timestamp is truncated to a days timestamp, eg 1111100000000 in the example below.  This approach allows us to age off a days worth of data without having to recalculate the summary data because it is calculated by the scan iterators.

The problem

The issue I have come across is when the scan iterators are added I get different results based on the priority of the minCombiner and maxCombiner.  The priority of the SummingCombiner seems unaffected when I change its priority. If the MinCombiner's priority is higher (smaller number) than the MaxCombiner the result is correct, but if I switch the priorities and give the MaxCombiner the higher priority the result is incorrect and the minCombiner is not run.


This looks like
----------------------------------------------------------------------------

Range range = new Range("harry", "harry~");

//Setup the MIN
IteratorSetting isTotalMin = new IteratorSetting ( 15, "Min Calc", MinCombiner.class");
MinCombiner.setColumns(isTotalMin, Collections.singleton(new Iterator.setting.Colomn("min")));
MinCombiner.setColumns (isTotalMin, MinCombiner.Type.STRING);

//Setup the MAX
IteratorSetting isTotalMax = new IteratorSetting ( 16, "Max Calc", MaxCombiner.class");
MaxCombiner.setColumns(isTotalMax, Collections.singleton(new Iterator.setting.Colomn("max")));
MaxCombiner.setColumns (isTotalMax, MaxCombiner.Type.STRING);

//Setup the MIN
IteratorSetting isTotalCount = new IteratorSetting ( 17, "Count Calc", SummingCombiner.class");
SummingCombiner.setColumns(isTotalCount, Collections.singleton(new Iterator.setting.Colomn("count")));
SumminCombiner.setColumns (isTotalCount, SummingCombiner.Type.STRING);

Scanner s = connector.createScanner(tableName, new Authorizations("L1", "L2");
s.addScanIterator(isTotalCount);
s.addScanIterator(isTotalMin);
s.addScanIterator(isTotalMax);
s.setRange(range);
s.fetchColumnFamily(new Text("count");
s.fetchColumnFamily(new Text("min");
s.fetchColumnFamily(new Text("max");
for (Entry<Key, Value> e : s) {
  System.out.println(e.getKey().getRow() + ", " + e.getKey().getColumnFamily() + ", " + e.getKey().getColumnQualifier() + ", VALUE: " + e.getValue());
}

--------------------------------------------------------------

If I run the above I get:

harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500
harry, min, 1111100000000, VALUE: 999

This is correct.

However if I alter the priority of the MaxCombiner to be 14 and leave the MinCombiner at 15 I get:

harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500

I lose the min value altogether.  I have tested altering the priority of the SummingCombiner but it doesn't seem to have any effect.

This may be due to the way I have setup the iterators or could be an Accumulo bug.

Keen to hear any thoughts.

Thanks in advance,
Matt

IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.

Re: MinCombiner and MaxCombiner priority issue [SEC=UNCLASSIFIED]

Posted by Adam Fuchs <af...@apache.org>.
Hi Matt,

I tried to replicate the behavior you saw and was not able to do so. There
must be some other factors involved. Can you describe what version of
Accumulo you have running and anything else that might be unique about the
instance (other iterators configured on the table, any additional code that
might be in play, etc.)? My test class is listed below, and I ran it with
Accumulo 1.4.2.

Cheers,
Adam


================================= IteratorPriorityTest.java
==============================

import java.util.Collections;
import java.util.Map.Entry;

import org.apache.accumulo.core.client.BatchWriter;
import org.apache.accumulo.core.client.Connector;
import org.apache.accumulo.core.client.Instance;
import org.apache.accumulo.core.client.IteratorSetting;
import org.apache.accumulo.core.client.Scanner;
import org.apache.accumulo.core.client.ZooKeeperInstance;
import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Mutation;
import org.apache.accumulo.core.data.Range;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.user.MaxCombiner;
import org.apache.accumulo.core.iterators.user.MinCombiner;
import org.apache.accumulo.core.iterators.user.SummingCombiner;
import org.apache.accumulo.core.security.Authorizations;
import org.apache.hadoop.io.Text;


public class IteratorPriorityTest {

  private static void writeData(int value, BatchWriter writer) throws
Exception {
    Mutation m = new Mutation("harry");
    m.put("count", "1111100000000", "1");
    writer.addMutation(m);
    Mutation m2 = new Mutation("harry");
    m2.put("min", "1111100000000", Integer.toString(value));
    writer.addMutation(m2);
    Mutation m3 = new Mutation("harry");
    m3.put("max", "1111100000000", Integer.toString(value));
    writer.addMutation(m3);
  }

  /**
   * @param args
   */
  public static void main(String[] args) throws Exception {
    Instance inst = new ZooKeeperInstance("instance", "localhost");
    Connector conn = inst.getConnector("root", "password");
    conn.tableOperations().create("test");

    BatchWriter writer = conn.createBatchWriter("test", 1000000, 100, 1);
    writeData(999,writer);
    writeData(12500,writer);
    writeData(1024,writer);
    writeData(2048,writer);
    writer.close();


    Range range = new Range("harry", "harry~");

    //Setup the MIN
    IteratorSetting isTotalMin = new IteratorSetting ( 15, "Min Calc",
MinCombiner.class);
    MinCombiner.setColumns(isTotalMin, Collections.singletonList(new
IteratorSetting.Column("min")));
    MinCombiner.setEncodingType(isTotalMin, MinCombiner.Type.STRING);

    //Setup the MAX
    IteratorSetting isTotalMax = new IteratorSetting ( 14, "Max Calc",
MaxCombiner.class);
    MaxCombiner.setColumns(isTotalMax, Collections.singletonList(new
IteratorSetting.Column("max")));
    MaxCombiner.setEncodingType(isTotalMax, MaxCombiner.Type.STRING);

    //Setup the MIN
    IteratorSetting isTotalCount = new IteratorSetting ( 17, "Count Calc",
SummingCombiner.class);
    SummingCombiner.setColumns(isTotalCount, Collections.singletonList(new
IteratorSetting.Column("count")));
    SummingCombiner.setEncodingType(isTotalCount,
SummingCombiner.Type.STRING);

    Scanner s = conn.createScanner("test", new Authorizations());
    s.addScanIterator(isTotalCount);
    s.addScanIterator(isTotalMin);
    s.addScanIterator(isTotalMax);
    s.setRange(range);
    s.fetchColumnFamily(new Text("count"));
    s.fetchColumnFamily(new Text("min"));
    s.fetchColumnFamily(new Text("max"));
    for (Entry<Key, Value> e : s) {
      System.out.println(e.getKey().getRow() + ", " +
e.getKey().getColumnFamily() + ", " + e.getKey().getColumnQualifier() + ",
VALUE: " + e.getValue());
    }

    conn.tableOperations().delete("test");
  }

}

==============================================================================================


On Mon, Feb 11, 2013 at 6:47 PM, Dickson, Matt MR <
matt.dickson@defence.gov.au> wrote:

> **
>
> *UNCLASSIFIED*
>  Hi,
>
> I'm reasonably new to using Accumulo so I apologise if some of my
> terminology is incorrect.
>
> *A bit of overview*
>
> We have an Accumulo table that ingests data in daily increments and ages
> off data in daily increments.  For each unique rowid we maintain a
> daily max and min value and a count, using the MinCombiner, MaxCombiner and
> SummingCombiner.  When a user queries the table for a rowid, scan
> iterators are added to calculate the min, max and count across the entire
> table by adding up the daily summaries of min, max and count.
>
> The timestamp is truncated to a days timestamp, eg 1111100000000 in the
> example below.  This approach allows us to age off a days worth of data
> without having to recalculate the summary data because it is calculated by
> the scan iterators.
>
> *The problem*
>
> The issue I have come across is when the scan iterators are added I get
> different results based on the priority of the minCombiner and
> maxCombiner.  The priority of the SummingCombiner seems unaffected when I
> change its priority. If the MinCombiner's priority is higher (smaller
> number) than the MaxCombiner the result is correct, but if I switch the
> priorities and give the MaxCombiner the higher priority the result is
> incorrect and the minCombiner is not run.
>
>
> This looks like
>
> ----------------------------------------------------------------------------
>
> Range range = new Range("harry", "harry~");
>
> //Setup the MIN
> IteratorSetting isTotalMin = new IteratorSetting ( *15*, "Min Calc",
> MinCombiner.class");
> MinCombiner.setColumns(isTotalMin, Collections.singleton(new
> Iterator.setting.Colomn("min")));
> MinCombiner.setColumns (isTotalMin, MinCombiner.Type.STRING);
>
>  //Setup the MAX
> IteratorSetting isTotalMax = new IteratorSetting ( *16*, "Max Calc",
> MaxCombiner.class");
> MaxCombiner.setColumns(isTotalMax, Collections.singleton(new
> Iterator.setting.Colomn("max")));
> MaxCombiner.setColumns (isTotalMax, MaxCombiner.Type.STRING);
>
>  //Setup the MIN
> IteratorSetting isTotalCount = new IteratorSetting ( *17*, "Count Calc",
> SummingCombiner.class");
> SummingCombiner.setColumns(isTotalCount, Collections.singleton(new
> Iterator.setting.Colomn("count")));
> SumminCombiner.setColumns (isTotalCount, SummingCombiner.Type.STRING);
>
> Scanner s = connector.createScanner(tableName, new Authorizations("L1",
> "L2");
> s.addScanIterator(isTotalCount);
> s.addScanIterator(isTotalMin);
> s.addScanIterator(isTotalMax);
> s.setRange(range);
> s.fetchColumnFamily(new Text("count");
> s.fetchColumnFamily(new Text("min");
> s.fetchColumnFamily(new Text("max");
> for (Entry<Key, Value> e : s) {
>   System.out.println(e.getKey().getRow() + ", " +
> e.getKey().getColumnFamily() + ", " + e.getKey().getColumnQualifier() + ",
> VALUE: " + e.getValue());
> }
>
> --------------------------------------------------------------
>
> If I run the above I get:
>
> harry, count, 1111100000000, VALUE: 4
> harry, max, 1111100000000, VALUE: 12500
> harry, min, 1111100000000, VALUE: 999
>
> This is correct.
>
> However if I alter the priority of the MaxCombiner to be *14* and leave
> the MinCombiner at *15* I get:
>
>  harry, count, 1111100000000, VALUE: 4
> harry, max, 1111100000000, VALUE: 12500
>
> I lose the min value altogether.  I have tested altering the priority of
> the SummingCombiner but it doesn't seem to have any effect.
>
> This may be due to the way I have setup the iterators or could be an
> Accumulo bug.
>
> Keen to hear any thoughts.
>
> Thanks in advance,
> Matt
>
> *IMPORTANT*: This email remains the property of the Department of Defence
> and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If
> you have received this email in error, you are requested to contact the
> sender and delete the email.
>