You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Benson Margulies <bi...@gmail.com> on 2012/02/14 15:54:31 UTC

Finding my way through the forest of iterators

I'm working with 1.3.5, and there's not a ton of javadoc in
org.apache.accumulo.core.iterators.

If I want to just filter to the rows that have a particular CF/CQ value,
does one of the existing iterator classes do the job, or do I need to write
one?

Re: Finding my way through the forest of iterators

Posted by Benson Margulies <bi...@gmail.com>.
On Wed, Feb 15, 2012 at 1:14 PM, Keith Turner <ke...@deenlo.com> wrote:
> Yeah I think you have listed the options correctly.  To be exhaustive,
> I suppose another option is to do the row selection/filtering on the
> client side and not use an iterator.
>
> If you happen to chose to write a custom iterator, you might as well
> write the iterator described in ACCUMULO-403, extend it, and post a
> patch :)

Once I'm working on 1.4, where I have some chance of actually using a
patch that I post, I'll be much more patch-prone.

>
> On Wed, Feb 15, 2012 at 12:40 PM, Benson Margulies
> <bi...@gmail.com> wrote:
>> So, in 1.3.5 ...
>>
>> If the idea I need to express is:
>>
>>  Given a constraint on range, cf, cq:
>>    for each row that matches the constraint
>>      return the cell that matched the constraint plus some other
>> cells specified by cf/cq
>>
>> For now, I could use the attributes of a scanner to express the
>> constraint, and then WholeRowIterator (if I can afford the memory)
>>
>> or I can write my own iterator
>>
>> or I can wait for the thing described in the jira?

Re: Finding my way through the forest of iterators

Posted by Keith Turner <ke...@deenlo.com>.
Yeah I think you have listed the options correctly.  To be exhaustive,
I suppose another option is to do the row selection/filtering on the
client side and not use an iterator.

If you happen to chose to write a custom iterator, you might as well
write the iterator described in ACCUMULO-403, extend it, and post a
patch :)

On Wed, Feb 15, 2012 at 12:40 PM, Benson Margulies
<bi...@gmail.com> wrote:
> So, in 1.3.5 ...
>
> If the idea I need to express is:
>
>  Given a constraint on range, cf, cq:
>    for each row that matches the constraint
>      return the cell that matched the constraint plus some other
> cells specified by cf/cq
>
> For now, I could use the attributes of a scanner to express the
> constraint, and then WholeRowIterator (if I can afford the memory)
>
> or I can write my own iterator
>
> or I can wait for the thing described in the jira?

Re: Finding my way through the forest of iterators

Posted by Benson Margulies <bi...@gmail.com>.
So, in 1.3.5 ...

If the idea I need to express is:

 Given a constraint on range, cf, cq:
    for each row that matches the constraint
      return the cell that matched the constraint plus some other
cells specified by cf/cq

For now, I could use the attributes of a scanner to express the
constraint, and then WholeRowIterator (if I can afford the memory)

or I can write my own iterator

or I can wait for the thing described in the jira?

Re: Finding my way through the forest of iterators

Posted by Keith Turner <ke...@deenlo.com>.
FYI,

I opened this ticket.

https://issues.apache.org/jira/browse/ACCUMULO-403

This may not help now, since it does not exist.  But your question
made me realize the we should probably implement a generalized row
selection iterator that works even if the row does not fit into
memory.


On Tue, Feb 14, 2012 at 11:39 AM, Benson Margulies
<bi...@gmail.com> wrote:
> Keith,
>
> I think I'm good with the simple filtering stuff right now. I'm aiming to
> make a second pass once 1.4 is released to revisit the problem of
> aggregation across CQ instead of only on Value.
>
> --benson
>
>
> On Tue, Feb 14, 2012 at 11:23 AM, Keith Turner <ke...@deenlo.com> wrote:
>>
>> The WholeRowIterator is one way to do this.  One drawback is that it
>> reads entire rows into memory.
>>
>> If rows may not fit into memory there is an efficient way to handle
>> this using two iterators.  One iterator creates another iterator that
>> used to determine if rows contain the column, if not it seeks the
>> original iterator over the row. If using the second iterator you
>> determine the row contains the column, then you can read the row from
>> the original iterator. This design allows you to efficiently return
>> rows that meet a particular criteria w/o reading the rows into memory.
>>  If you are interested in learning more I can point you to examples in
>> the 1.4 code.
>>
>>
>> On Tue, Feb 14, 2012 at 9:54 AM, Benson Margulies <bi...@gmail.com>
>> wrote:
>> > I'm working with 1.3.5, and there's not a ton of javadoc in
>> > org.apache.accumulo.core.iterators.
>> >
>> > If I want to just filter to the rows that have a particular CF/CQ value,
>> > does one of the existing iterator classes do the job, or do I need to
>> > write
>> > one?
>> >
>
>

Re: Finding my way through the forest of iterators

Posted by Benson Margulies <bi...@gmail.com>.
Keith,

I think I'm good with the simple filtering stuff right now. I'm aiming to
make a second pass once 1.4 is released to revisit the problem of
aggregation across CQ instead of only on Value.

--benson


On Tue, Feb 14, 2012 at 11:23 AM, Keith Turner <ke...@deenlo.com> wrote:

> The WholeRowIterator is one way to do this.  One drawback is that it
> reads entire rows into memory.
>
> If rows may not fit into memory there is an efficient way to handle
> this using two iterators.  One iterator creates another iterator that
> used to determine if rows contain the column, if not it seeks the
> original iterator over the row. If using the second iterator you
> determine the row contains the column, then you can read the row from
> the original iterator. This design allows you to efficiently return
> rows that meet a particular criteria w/o reading the rows into memory.
>  If you are interested in learning more I can point you to examples in
> the 1.4 code.
>
>
> On Tue, Feb 14, 2012 at 9:54 AM, Benson Margulies <bi...@gmail.com>
> wrote:
> > I'm working with 1.3.5, and there's not a ton of javadoc in
> > org.apache.accumulo.core.iterators.
> >
> > If I want to just filter to the rows that have a particular CF/CQ value,
> > does one of the existing iterator classes do the job, or do I need to
> write
> > one?
> >
>

Re: Finding my way through the forest of iterators

Posted by Keith Turner <ke...@deenlo.com>.
The WholeRowIterator is one way to do this.  One drawback is that it
reads entire rows into memory.

If rows may not fit into memory there is an efficient way to handle
this using two iterators.  One iterator creates another iterator that
used to determine if rows contain the column, if not it seeks the
original iterator over the row. If using the second iterator you
determine the row contains the column, then you can read the row from
the original iterator. This design allows you to efficiently return
rows that meet a particular criteria w/o reading the rows into memory.
 If you are interested in learning more I can point you to examples in
the 1.4 code.


On Tue, Feb 14, 2012 at 9:54 AM, Benson Margulies <bi...@gmail.com> wrote:
> I'm working with 1.3.5, and there's not a ton of javadoc in
> org.apache.accumulo.core.iterators.
>
> If I want to just filter to the rows that have a particular CF/CQ value,
> does one of the existing iterator classes do the job, or do I need to write
> one?
>

Re: Finding my way through the forest of iterators

Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Tuesday, February 14, 2012 9:54:31 AM, "Benson Margulies" <bi...@gmail.com> wrote:
> I'm working with 1.3.5, and there's not a ton of javadoc in
> org.apache.accumulo.core.iterators.
> 
> 
> If I want to just filter to the rows that have a particular CF/CQ
> value, does one of the existing iterator classes do the job, or do I
> need to write one?

The Scanner has built-in column filtering with the fetchColumn(Text colFam, Text colQual) and fetchColumnFamily(Text col) methods.  This is implemented as an iterator behind the scenes, but you don't have to configure it as such.  There is also a separate Java regex capability if you aren't matching a specific column.

Billie