You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Matthew Purdy <mp...@gmail.com> on 2014/07/02 04:41:00 UTC

scan iterator that rolls up col vis

USE CASE: on scan only; want to have a "summing combiner" that rolls
up by (rowId, colfam, colqual) on all row keys where the client has
visibility.

below is a simple example that expresses the use case.

accumulo table holding student to professor relationship by departments


+----------+------------------+-----------+--------------+-----+
|  rowId   |       colfam     |  colqual  |    colvis    | val |
+----------+------------------+-----------+--------------+-----+
| student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
| student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
| student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
| student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
| student2 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
| student2 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
+----------+------------------+-----------+--------------+-----+


with the summing combiner the results would be

+----------+------------------+-----------+--------------+-----+
|  rowId   |       colfam     |  colqual  |    colvis    | val |
+----------+------------------+-----------+--------------+-----+
| student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   2 |
| student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   2 |
| student2 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
| student2 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
+----------+------------------+-----------+--------------+-----+

- the math department can only see math department totals
- the com sci department can only see the com sci department total
- the office of the dean has both access

therefore when scanning (it wouldnt work for compaction), how
can you sum over colvis?

assuming you had both colvis access the desired results would be:

+----------+------------------+-----------+-----+
|  rowId   |       colfam     |  colqual  | val |
+----------+------------------+-----------------+
| student1 | TAKES_CLASS_WITH |  prof1    |   4 |
| student2 | TAKES_CLASS_WITH |  prof1    |   2 |
+----------+------------------+-----------+-----+

Re: scan iterator that rolls up col vis

Posted by Matthew Purdy <mp...@gmail.com>.
sorry for not getting back.

it seems to me a very simple solutions is modifying the
Combiner.ValueIterator._hasNext() from using
PartialKey.ROW_COLFAM_COLQUAL_COLVIS => PartialKey.ROW_COLFAM_COLQUAL

only problem with this is you lose the CV; however, by adding a new method
to Combiner.ValueIterator getKey() you can build up the CV field for the
set of all CVs with in a matching PartialKey.ROW_COLFAM_COLQUAL

also, Combiner has a public method setPartialKey() you could rollup on any
PartialKey very easily.

//note: i am currently using 1.4.4
private boolean _hasNext() {
      return source.hasTop() && !source.getTopKey().isDeleted() &&
topKey.equals(source.getTopKey(), PartialKey.ROW_COLFAM_COLQUAL);
    }


On Wed, Jul 2, 2014 at 11:27 AM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> you should be able to roll up on keys with a condition similar to:
>
> if( source.hasTop() ) {
>   Key start = new Key(source.getTopKey()); // avoid instance-reuse issues
>   long count = 0;
>   while( source.hasTop() && start.equals( source.getTopKey(),
> PartialKey.ROW_COLFAM_COLQUAL_COLVIS ) {
>     count += deserialize(source.getTopValue());
>     source.next();
>   }
>   Value new_top_value = serialize(count);
>   // start can represent the top key of the iterator
> }
>
> We can flesh this out further if you run into issues. I think that we may
> need to set the start key's timestamp to 0 so that it sorts after all the
> other cells with a similar prefix.
>
>
> On Tue, Jul 1, 2014 at 10:41 PM, Matthew Purdy <
> mpurdy1973usergroups@gmail.com> wrote:
>
>>
>>
>> USE CASE: on scan only; want to have a "summing combiner" that rolls
>> up by (rowId, colfam, colqual) on all row keys where the client has
>> visibility.
>>
>> below is a simple example that expresses the use case.
>>
>> accumulo table holding student to professor relationship by departments
>>
>>
>> +----------+------------------+-----------+--------------+-----+
>> |  rowId   |       colfam     |  colqual  |    colvis    | val |
>> +----------+------------------+-----------+--------------+-----+
>> | student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
>> | student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
>> | student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
>> | student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
>> | student2 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
>> | student2 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
>> +----------+------------------+-----------+--------------+-----+
>>
>>
>> with the summing combiner the results would be
>>
>> +----------+------------------+-----------+--------------+-----+
>> |  rowId   |       colfam     |  colqual  |    colvis    | val |
>> +----------+------------------+-----------+--------------+-----+
>> | student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   2 |
>> | student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   2 |
>> | student2 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
>> | student2 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
>> +----------+------------------+-----------+--------------+-----+
>>
>> - the math department can only see math department totals
>> - the com sci department can only see the com sci department total
>> - the office of the dean has both access
>>
>> therefore when scanning (it wouldnt work for compaction), how
>> can you sum over colvis?
>>
>> assuming you had both colvis access the desired results would be:
>>
>> +----------+------------------+-----------+-----+
>> |  rowId   |       colfam     |  colqual  | val |
>> +----------+------------------+-----------------+
>> | student1 | TAKES_CLASS_WITH |  prof1    |   4 |
>> | student2 | TAKES_CLASS_WITH |  prof1    |   2 |
>> +----------+------------------+-----------+-----+
>>
>>
>>
>


-- 
Thank You,
Matthew Purdy

------------------------------------------------------------------------------------------------------------------
Matthew Purdy
mpurdy1973userGroups@gmail.com
443.848.1595
--------------------------------------
"Lead, follow, or get out of the way." -- Thomas Paine
"Make everything as simple as possible, but not simpler." -- Albert Einstein
"The definition of insanity is doing the same thing over and over and
expecting a different result." -- Benjamin Franklin
"We can't solve problems by using the same kind of thinking we used when we
created them." -- Albert Einstein
------------------------------------------------------------------------------------------------------------------

Re: scan iterator that rolls up col vis

Posted by William Slacum <wi...@accumulo.net>.
you should be able to roll up on keys with a condition similar to:

if( source.hasTop() ) {
  Key start = new Key(source.getTopKey()); // avoid instance-reuse issues
  long count = 0;
  while( source.hasTop() && start.equals( source.getTopKey(),
PartialKey.ROW_COLFAM_COLQUAL_COLVIS ) {
    count += deserialize(source.getTopValue());
    source.next();
  }
  Value new_top_value = serialize(count);
  // start can represent the top key of the iterator
}

We can flesh this out further if you run into issues. I think that we may
need to set the start key's timestamp to 0 so that it sorts after all the
other cells with a similar prefix.


On Tue, Jul 1, 2014 at 10:41 PM, Matthew Purdy <
mpurdy1973usergroups@gmail.com> wrote:

>
>
> USE CASE: on scan only; want to have a "summing combiner" that rolls
> up by (rowId, colfam, colqual) on all row keys where the client has
> visibility.
>
> below is a simple example that expresses the use case.
>
> accumulo table holding student to professor relationship by departments
>
>
> +----------+------------------+-----------+--------------+-----+
> |  rowId   |       colfam     |  colqual  |    colvis    | val |
> +----------+------------------+-----------+--------------+-----+
> | student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
> | student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
> | student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
> | student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
> | student2 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
> | student2 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
> +----------+------------------+-----------+--------------+-----+
>
>
> with the summing combiner the results would be
>
> +----------+------------------+-----------+--------------+-----+
> |  rowId   |       colfam     |  colqual  |    colvis    | val |
> +----------+------------------+-----------+--------------+-----+
> | student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   2 |
> | student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   2 |
> | student2 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
> | student2 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
> +----------+------------------+-----------+--------------+-----+
>
> - the math department can only see math department totals
> - the com sci department can only see the com sci department total
> - the office of the dean has both access
>
> therefore when scanning (it wouldnt work for compaction), how
> can you sum over colvis?
>
> assuming you had both colvis access the desired results would be:
>
> +----------+------------------+-----------+-----+
> |  rowId   |       colfam     |  colqual  | val |
> +----------+------------------+-----------------+
> | student1 | TAKES_CLASS_WITH |  prof1    |   4 |
> | student2 | TAKES_CLASS_WITH |  prof1    |   2 |
> +----------+------------------+-----------+-----+
>
>
>