You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Patrone, Dennis S." <De...@jhuapl.edu> on 2012/10/30 16:02:44 UTC

IteratorSetting and priorities

Hi all,

Is there a reason that ScannerOptions only allows a single iterator per priority value?  It seems that multiple iterators added at the same priority could just be executed in an arbitrary order by the system.

I have a ScannerBase that gets passed around through several classes.  These classes add different filters (for different reasons) to the scanner based on the particular request being processed and user configuration.  Requiring only one filter per priority imposes a dependency among the different classes managing the filters.  They have to coordinate to make sure no one reuses the same priority.

I'd rather be able to set priorities based on the (expected) selectivity of the filter only within the class adding a subset of the filters, and let the cross-'domain' filtering priorities be managed automatically by Accumulo.

Even worse, the ScannerBase API does not provide access to the already-added IteratorSettings or even the min/max iterator priority, so I have no way AFAICT to ensure via the API that my iterator priority is not in conflict with an existing priority.  I have to manage the priority value through an unenforceable convention... and wait for a RuntimeException(!) to tell me when the convention is violated.

I think minimally an accessor method needs to be added so I can ensure my priority isn't going to clash and cause an IllegalArgumentException.

Ideally, I'd like to see filters added at the same priority allowed and just executed in some arbitrary order (or some well-defined order within the priority, e.g., in order they were added?).

I'd be willing to contribute some updates for this, but before I started I wanted to see if this is reasonable, if anyone else thinks it is a good idea, or if there are real valid reasons only one iterator per priority is allowed.

Thanks,
Dennis


Dennis Patrone
The Johns Hopkins University / Applied Physics Laboratory
240-228-2285 / Washington
443-778-2285 / Baltimore
443-220-7190 / Cell
dennis.patrone@jhuapl.edu<ma...@jhuapl.edu>


Re: IteratorSetting and priorities

Posted by William Slacum <wi...@accumulo.net>.
It's because you're building a stack of iterators and the order you set on
the scanner is the order of sources created and passed to init() for each
iterator you create in the stack when the scan is executing on a TServer.
Albeit deprecated, the filtering API in 1.3 does allow you to set multiple
filters at the same priority, though it is broken in certain cases.

The semantics of set up and call order are such that "I want my KVs coming
out of iterator A at priority N to be handled by iterator B at priority N +
1." If you want function composition of your predicates, then increasing
priorities/positions in the stack is the correct approach. I think most, if
not all, of what you want can be accomplished via client side helpers.

For example of where a tree model is used (and you'll be able to see that
tree's can't actually be defined on the client side), check out the
IntersectingIterator.

On Wed, Oct 31, 2012 at 7:52 AM, Patrone, Dennis S. <
Dennis.Patrone@jhuapl.edu> wrote:

> > The issue with giving multiple iterators the same priority is that the
> API specifies that during the call to init(), one source is given the
> iterator.
>
> I fail to see how this is an issue.  I don't really want a "tree" of
> iterators (I'm not sure how you'd combine the multiple results moving back
> up the tree).  I still want a straight line of iterators, I just don't want
> to have to worry about ordering within a "set" of them at the same priority
> level.
>
> So right now if I add I1 @ priority 1, I2 @ priority 2, and I3 @ priority
> 3, then basically (as I understand it, at least) the output of I1 is fed
> into I2.  Then the output of I2 is fed into I3.
>
> What I want is the API to allow me to add I2 and I3 at priority 2.  Then
> the system has two choices to process my request:
>
> I1 -> I2 -> I3
>
> ...OR...
>
> I1 -> I3 -> I2
>
> Based on my priority values, I don't care which processing chain is
> followed; either is correct.
>
> What I'm NOT asking for is this:
>
>       I2
>     /
> I1
>     \
>       I3
>
> Am I missing something?
>
> Billie- I also looked at ACCUMULO-759.  I need some time later to read
> through it and follow the discussion but then I will try to add something
> coherent.
>
> Thanks,
> Dennis
>
>

RE: IteratorSetting and priorities

Posted by "Patrone, Dennis S." <De...@jhuapl.edu>.
> The issue with giving multiple iterators the same priority is that the API specifies that during the call to init(), one source is given the iterator.

I fail to see how this is an issue.  I don't really want a "tree" of iterators (I'm not sure how you'd combine the multiple results moving back up the tree).  I still want a straight line of iterators, I just don't want to have to worry about ordering within a "set" of them at the same priority level.

So right now if I add I1 @ priority 1, I2 @ priority 2, and I3 @ priority 3, then basically (as I understand it, at least) the output of I1 is fed into I2.  Then the output of I2 is fed into I3.

What I want is the API to allow me to add I2 and I3 at priority 2.  Then the system has two choices to process my request:

I1 -> I2 -> I3

...OR...

I1 -> I3 -> I2

Based on my priority values, I don't care which processing chain is followed; either is correct.  

What I'm NOT asking for is this:

      I2
    /
I1
    \ 
      I3

Am I missing something?

Billie- I also looked at ACCUMULO-759.  I need some time later to read through it and follow the discussion but then I will try to add something coherent.

Thanks,
Dennis


Re: IteratorSetting and priorities

Posted by William Slacum <wi...@accumulo.net>.
The issue with giving multiple iterators the same priority is that the API
specifies that during the call to init(), one source is given the iterator.
Now, that iterator can make multiple copies of that source via deepCopy()
to make a tree of iterators, but by default its given one source.

In the absence of a more convenient API for tracking priorities, you could
create a Queue<IteratorSetting> and push the filters on you want on there,
and iteratively apply each IteratorSetting to the Scanner after you're done.

Personally, I have kicked the around the idea of client helpers that keep
track of priorities and provide queue or stack like interfaces to setting
up iterators. This doesn't solve the disparity between being able to create
trees of iterators on the server side versus only being able create a stack
on the client side.

On Tue, Oct 30, 2012 at 11:02 AM, Patrone, Dennis S. <
Dennis.Patrone@jhuapl.edu> wrote:

> Hi all,
>
> Is there a reason that ScannerOptions only allows a single iterator per
> priority value?  It seems that multiple iterators added at the same
> priority could just be executed in an arbitrary order by the system.
>
> I have a ScannerBase that gets passed around through several classes.
>  These classes add different filters (for different reasons) to the scanner
> based on the particular request being processed and user configuration.
>  Requiring only one filter per priority imposes a dependency among the
> different classes managing the filters.  They have to coordinate to make
> sure no one reuses the same priority.
>
> I'd rather be able to set priorities based on the (expected) selectivity
> of the filter only within the class adding a subset of the filters, and let
> the cross-'domain' filtering priorities be managed automatically by
> Accumulo.
>
> Even worse, the ScannerBase API does not provide access to the
> already-added IteratorSettings or even the min/max iterator priority, so I
> have no way AFAICT to ensure via the API that my iterator priority is not
> in conflict with an existing priority.  I have to manage the priority value
> through an unenforceable convention... and wait for a RuntimeException(!)
> to tell me when the convention is violated.
>
> I think minimally an accessor method needs to be added so I can ensure my
> priority isn't going to clash and cause an IllegalArgumentException.
>
> Ideally, I'd like to see filters added at the same priority allowed and
> just executed in some arbitrary order (or some well-defined order within
> the priority, e.g., in order they were added?).
>
> I'd be willing to contribute some updates for this, but before I started I
> wanted to see if this is reasonable, if anyone else thinks it is a good
> idea, or if there are real valid reasons only one iterator per priority is
> allowed.
>
> Thanks,
> Dennis
>
>
> Dennis Patrone
> The Johns Hopkins University / Applied Physics Laboratory
> 240-228-2285 / Washington
> 443-778-2285 / Baltimore
> 443-220-7190 / Cell
> dennis.patrone@jhuapl.edu<ma...@jhuapl.edu>
>
>

Re: IteratorSetting and priorities

Posted by Billie Rinaldi <bi...@gmail.com>.
On Tue, Oct 30, 2012 at 8:02 AM, Patrone, Dennis S. <
Dennis.Patrone@jhuapl.edu> wrote:

> Hi all,
>
> Is there a reason that ScannerOptions only allows a single iterator per
> priority value?  It seems that multiple iterators added at the same
> priority could just be executed in an arbitrary order by the system.
>
> I have a ScannerBase that gets passed around through several classes.
>  These classes add different filters (for different reasons) to the scanner
> based on the particular request being processed and user configuration.
>  Requiring only one filter per priority imposes a dependency among the
> different classes managing the filters.  They have to coordinate to make
> sure no one reuses the same priority.
>
> I'd rather be able to set priorities based on the (expected) selectivity
> of the filter only within the class adding a subset of the filters, and let
> the cross-'domain' filtering priorities be managed automatically by
> Accumulo.
>
> Even worse, the ScannerBase API does not provide access to the
> already-added IteratorSettings or even the min/max iterator priority, so I
> have no way AFAICT to ensure via the API that my iterator priority is not
> in conflict with an existing priority.  I have to manage the priority value
> through an unenforceable convention... and wait for a RuntimeException(!)
> to tell me when the convention is violated.
>
> I think minimally an accessor method needs to be added so I can ensure my
> priority isn't going to clash and cause an IllegalArgumentException.
>
> Ideally, I'd like to see filters added at the same priority allowed and
> just executed in some arbitrary order (or some well-defined order within
> the priority, e.g., in order they were added?).
>
> I'd be willing to contribute some updates for this, but before I started I
> wanted to see if this is reasonable, if anyone else thinks it is a good
> idea, or if there are real valid reasons only one iterator per priority is
> allowed.
>

The discussion on ACCUMULO-759 regarding changing the API for scanner
iterators may interest you.  Feel free to weigh in.

Billie



>
> Thanks,
> Dennis
>
>
> Dennis Patrone
> The Johns Hopkins University / Applied Physics Laboratory
> 240-228-2285 / Washington
> 443-778-2285 / Baltimore
> 443-220-7190 / Cell
> dennis.patrone@jhuapl.edu<ma...@jhuapl.edu>
>
>