You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Adam Fuchs (JIRA)" <ji...@apache.org> on 2012/09/08 22:32:07 UTC

[jira] [Created] (ACCUMULO-759) remove priority setting for scan-time iterators

Adam Fuchs created ACCUMULO-759:
-----------------------------------

             Summary: remove priority setting for scan-time iterators
                 Key: ACCUMULO-759
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
             Project: Accumulo
          Issue Type: Improvement
            Reporter: Adam Fuchs


Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.

I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.

With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Christopher Tubbs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451413#comment-13451413 ] 

Christopher Tubbs commented on ACCUMULO-759:
--------------------------------------------

I like this approach. However, I think there may still be cases where we'd want to support injecting an iterator into the per-table configured iterator priority stack. I don't know of any particular use cases, but I wouldn't want to remove functionality.

So, I suggest deprecating the "addScanIterator" method, and replace it with "insertScanIterator(priority, ...)", and "appendScanIterator(...)"
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Billie Rinaldi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452966#comment-13452966 ] 

Billie Rinaldi commented on ACCUMULO-759:
-----------------------------------------

> However, the boolean is more restrictive than this, because it prevents insertion of an iterator at other points in the scan.

No, the boolean method could be used along with two Scanner methods:

{code:java}
  something(IteratorSetting) // user handles priority, boolean set to false
  something(ScanIteratorSetting) // priority is handled automatically, boolean set to true
{code}

which is not to say that I'm convinced this is the way to do it.  I kind of like the port-like method you suggest, but it does break some things people were doing before (mainly setting an iterator at priority Integer.MAX_VALUE), so I wanted to suggest a method that would not.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Dave Marion (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453584#comment-13453584 ] 

Dave Marion commented on ACCUMULO-759:
--------------------------------------

When you say IteratorChain, do you mean the one from Apache Commons Collections? If not, I would look at this package. There are some pretty cool features.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Billie Rinaldi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452966#comment-13452966 ] 

Billie Rinaldi edited comment on ACCUMULO-759 at 9/11/12 11:19 PM:
-------------------------------------------------------------------

> However, the boolean is more restrictive than this, because it prevents insertion of an iterator at other points in the scan.

No, the boolean method could be used along with two Scanner methods:

{code:java}
something(IteratorSetting) // user handles priority, boolean set to false
something(ScanIteratorSetting) // priority is handled automatically, boolean set to true
{code}

which is not to say that I'm convinced this is the way to do it.  I kind of like the port-like method you suggest, but it does break some things people were doing before (mainly setting an iterator at priority Integer.MAX_VALUE), so I wanted to suggest a method that would not.
                
      was (Author: billie.rinaldi):
    > However, the boolean is more restrictive than this, because it prevents insertion of an iterator at other points in the scan.

No, the boolean method could be used along with two Scanner methods:

{code:java}
  something(IteratorSetting) // user handles priority, boolean set to false
  something(ScanIteratorSetting) // priority is handled automatically, boolean set to true
{code}

which is not to say that I'm convinced this is the way to do it.  I kind of like the port-like method you suggest, but it does break some things people were doing before (mainly setting an iterator at priority Integer.MAX_VALUE), so I wanted to suggest a method that would not.
                  
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Billie Rinaldi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452666#comment-13452666 ] 

Billie Rinaldi commented on ACCUMULO-759:
-----------------------------------------

What if we added a boolean to IterInfo to indicate whether it is a per-scan iterator or not.  We can sort them first by the boolean and then by the priority.  Then we could have a method
{code:java}
Scanner add(ScanIteratorSetting is)
{code}
and every time it is called increment a counter and use that as the priority for the IterInfo.  So it would look like scanner.add(is1).add(is2).
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Christopher Tubbs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452801#comment-13452801 ] 

Christopher Tubbs commented on ACCUMULO-759:
--------------------------------------------

I do like the ability to chain added by returning a Scanner, though I still prefer "append" over "add" due to the tendency for "add" to get overloaded and confusing. Also, if "append" is the behavior for scan-time iterators, without the priority, then the term "scan" can be dropped from the method. So, "appendScanIterator(ScanIteratorSetting)" becomes "scanner.appendIterator(IteratorSetting)".

Also, the boolean seems to achieve the same as the convention of <1024 vs. >=1024 (scan iterators would just start at 1024, and +1 for each successive iterator appended). However, the boolean is more restrictive than this, because it prevents insertion of an iterator at other points in the scan. So, I guess it comes down to whether or not the current behavior should be modified in this restrictive way. Personally, I think it shouldn't be. Consider two use cases:

TableA is configured with a per-table iterator that groups and displays rows as JSON upon query. A query framework is built on this table that allows users to filter out particular columns from each row at scan time (relational algebra projection). However, the view will always be JSON. It seems reasonable to set a per-table iterator that converts rows to JSON at priority 500, and at scan-time, inject the filtering iterator at priority 400.

Now, this is a trivial example, where users are constrained to a particular view that could just as easily be added at scan time. However, consider the use case where an iterator is applied to a table to enforce a view policy that is intended to protect patient privacy or enforce a DRM scheme on multimedia content. Such an iterator may allow lower-priority filters, but could only show counts of the matching results. Alternatively, if such an iterator is given the proper payment method, it could encode the data with a DRM scheme to lease the queried content to a subscriber for some requested period of time.

These are just a few examples of why I think it would be too constraining to only allow appending scan-time iterators and not allow injecting them at a lower priority.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Adam Fuchs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453457#comment-13453457 ] 

Adam Fuchs commented on ACCUMULO-759:
-------------------------------------

What if we went with a more immutable form of Scanner, which was more representative of the data set? This would allow things like:

{code:java}

Scanner allTheThings = connector.createScanner(...);
ScanIteratorSetting aggregateEverything = new ScanIteratorSetting(MyAggregatingIterator.class,properties);
Scanner aggregatedThings = allTheThings.transform(aggregateEverything);

// scan over the table with the iterator
for(Entry<Key,Value> aggregate: aggregatedThings)
{
  ...
}

// scan over the table without extra scan-time iterators
for(Entry<Key,Value> thing: allTheThings)
{
  ...
}

{code}

I think this might be an easier way for users to conceptualize iterators.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Christopher Tubbs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452548#comment-13452548 ] 

Christopher Tubbs commented on ACCUMULO-759:
--------------------------------------------

I don't think the negative numbers would be easy to understand. The "appendScanIterator(...)" I suggested using Keith's suggestion for a "conn.tableOperations().getMaxIteratorPriority(table);" for its implementation, might work.

An alternative to the "getMaxIteratorPriority(table)" would be to do something like TCP port numbering... reserve 1023 and below for per-table iterators (if a table has more than this in use, it's schema is probably pretty poorly designed and the data should simply be re-ingested), and allow 1024 and above for client-configured scan iterators (using my "append" suggestion to start at 1024 by default; my suggested "insert" method should still allow injecting into the <1024 priority space). This change is not backwards compatible.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452136#comment-13452136 ] 

Keith Turner commented on ACCUMULO-759:
---------------------------------------

I like the approach I proposed because it does not change the current API.  There is one issue with it though.  Its subject to race conditions.   A table iterator configured after conn.tableOperations().getMaxIteratorPriority() may result in the scan iterator not coming after all table iterators.  In this case the users intent is not satisfied, because the users intent is not directly communicated to the system.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453309#comment-13453309 ] 

Keith Turner commented on ACCUMULO-759:
---------------------------------------

After reading all of the comments, now I am thinking of the following.

I would avoid restricting the priority for tablet iterators to < 1024.  From an API perspective I think that cleanest option may be to set a list.  With a method like appendScanIterator() that applies we need method like clearScanIterators().  I think using a boolean or negative number is a behind the scenes implementation detail.  When the API uses a list, it does not need to expose either.

{code:java}
//the purpose of this class is to allow user to configure scan time iterators... it does not allow the user to set the priority or name.... 
class ScanIteratorSetting extends IteratorSetting {

   //has all of IteratorSetting's constructors, but priority and name are never arguments e.g.
   public ScanIteratorSetting(String iteratorClass) {
     super(-42, "does not matter", iteratorClass, new HashMap<String,String>());
   }

   public void setPriority(int priority) {
     throw new UnsupportedOperationsException();
   }

   public void setName(String name) {
     throw new UnsupportedOperationsException();
   }
}
{code}

Maybe ScanIteratorSetting should not extend IteratorSetting.  Maybe IteratorSetting and ScanIteratorSetting should have a common parent?

The scanner would keep all of its current methods and add the following method.

{code:java}

interface ScannerBase {

   public void addScanIterator(IteratorSetting cfg);
   public void removeScanIterator(String iteratorName);
   public void updateScanIteratorOption(String iteratorName, String key, String value);

   /**
    * Scan time iterators that will execute server side in the order given in the list after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    */
   void setIterators(List<ScanIteratorSetting> scanIterators);

   /**
    * A convenience method for setting a single scan time iterator that will execute after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    */
   void setIterator(ScanIteratorSetting scanIterator);

}

{code}

addScanIterator() should probably throw an expception if passed a ScanIteratorSetting

                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453309#comment-13453309 ] 

Keith Turner edited comment on ACCUMULO-759 at 9/12/12 6:16 AM:
----------------------------------------------------------------

After reading all of the comments, now I am thinking of the following.

I would avoid restricting the priority for tablet iterators to < 1024.  From an API perspective I think that cleanest option may be to set a list.  With a method like appendScanIterator() that applies we need method like clearScanIterators().  I think using a boolean or negative number is a behind the scenes implementation detail.  When the API uses a list, it does not need to expose either.

{code:java}
//the purpose of this class is to allow user to configure scan time iterators... it does not allow the user to set the priority or name.... 
class ScanIteratorSetting extends IteratorSetting {

   //has all of IteratorSetting's constructors, but priority and name are never arguments e.g.
   public ScanIteratorSetting(String iteratorClass) {
     super(-42, "does not matter", iteratorClass, new HashMap<String,String>());
   }

   public void setPriority(int priority) {
     throw new UnsupportedOperationsException();
   }

   public void setName(String name) {
     throw new UnsupportedOperationsException();
   }
}
{code}

Maybe ScanIteratorSetting should not extend IteratorSetting.  Maybe IteratorSetting and ScanIteratorSetting should have a common parent?

The scanner would keep all of its current methods and add the following method.

{code:java}

interface ScannerBase {

   public void addScanIterator(IteratorSetting cfg);
   public void removeScanIterator(String iteratorName);
   public void updateScanIteratorOption(String iteratorName, String key, String value);

   /**
    * Scan time iterators that will execute server side in the order given in the list after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterators(List<ScanIteratorSetting> scanIterators);

   /**
    * A convenience method for setting a single scan time iterator that will execute after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterator(ScanIteratorSetting scanIterator);

}

{code}

addScanIterator() should probably throw an expception if passed a ScanIteratorSetting
                
      was (Author: kturner):
    After reading all of the comments, now I am thinking of the following.

I would avoid restricting the priority for tablet iterators to < 1024.  From an API perspective I think that cleanest option may be to set a list.  With a method like appendScanIterator() that applies we need method like clearScanIterators().  I think using a boolean or negative number is a behind the scenes implementation detail.  When the API uses a list, it does not need to expose either.

{code:java}
//the purpose of this class is to allow user to configure scan time iterators... it does not allow the user to set the priority or name.... 
class ScanIteratorSetting extends IteratorSetting {

   //has all of IteratorSetting's constructors, but priority and name are never arguments e.g.
   public ScanIteratorSetting(String iteratorClass) {
     super(-42, "does not matter", iteratorClass, new HashMap<String,String>());
   }

   public void setPriority(int priority) {
     throw new UnsupportedOperationsException();
   }

   public void setName(String name) {
     throw new UnsupportedOperationsException();
   }
}
{code}

Maybe ScanIteratorSetting should not extend IteratorSetting.  Maybe IteratorSetting and ScanIteratorSetting should have a common parent?

The scanner would keep all of its current methods and add the following method.

{code:java}

interface ScannerBase {

   public void addScanIterator(IteratorSetting cfg);
   public void removeScanIterator(String iteratorName);
   public void updateScanIteratorOption(String iteratorName, String key, String value);

   /**
    * Scan time iterators that will execute server side in the order given in the list after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    */
   void setIterators(List<ScanIteratorSetting> scanIterators);

   /**
    * A convenience method for setting a single scan time iterator that will execute after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    */
   void setIterator(ScanIteratorSetting scanIterator);

}

{code}

addScanIterator() should probably throw an expception if passed a ScanIteratorSetting

                  
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452156#comment-13452156 ] 

Keith Turner commented on ACCUMULO-759:
---------------------------------------

One other thought I had, was to use negative priorities.  I thought this was kinda ugly.  Negative numbers are currently not allowed.   Could continue to disallow negative numbers for table iterators and allow them for scan time iterators.  Negative priorities would always be interpreted on the server side as "after all table iterators".  If negative numbers were used, we would want to do that in such a way that the user never entered a negative number.

Could add the following to IteratorSetting : 

{code:java}
  public static final int MAX_TABLE_PRIORITY = Integer.MAX_VALUE;
{code}

The code example I gave above would change to the following.

{code:java}
//assume conn and scanner are initialized somewhow, just want to show their type
   Connector conn;
   Scanner scanner;

   int tableMax = IteratorSetting.MAX_TABLE_PRIORITY + 1; //this is effectively Integer.MIN_VALUE
   
   scanner.addScanIterator(new IteratorSetting(tableMax++, "foo1", ".org.bar.FooIter));
   scanner.addScanIterator(new IteratorSetting(tableMax++, "foo2", ".org.bar.BarIter));
{code}

The above is ugly, but I just want to show my thought process.  I think the code below is much less offensive from a user perspective.  It does something screwy with negative numbers behind the scenes, but that is hidden from the user.

{code:java}
class ScanIteratorSetting extends IteratorSetting {

   public ScanIteratorSetting(String name, String iteratorClass)
     super(Integer.MIN_VALUE, name, iteratorClass);
   }

   public ScanIteratorSetting(ScanIteratorSetting predecessor, String name, String iteratorClass)
     super(predecessor.priority+1, name, iteratorClass);
   }
{code}

So now the code would look like this.

{code:java}
//assume conn and scanner are initialized somewhow, just want to show their type
   Connector conn;
   Scanner scanner;

   
   ScanIteratorSetting is1 = new ScanIteratorSetting("foo1", ".org.bar.FooIter); //comes after all table iterators   
   scanner.addScanIterator(is1);

   ScanIteratorSetting is2 = new ScanIteratorSetting(is1, "foo2", ".org.bar.BarIter); //comes after all table iterators and after foo1
   scanner.addScanIterator(is2);
{code}

Can we make the code for chaining iterators more compact and intuitive?
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452117#comment-13452117 ] 

Keith Turner commented on ACCUMULO-759:
---------------------------------------

We do need to make this easier for users.  Currently if the user wants scan time iterators to come after iterators configured for the table, they must guess and choose some "large" priority that should make this happen.  I think we should try to make achieving this goal easier while preserving the current API.  One possible way to achieve is to add a getMaxIteratorPriority() method to table operations.   So a user could then do something like the following.

{code:java}
//assume conn and scanner are initialized somewhow, just want to show their type
   Connector conn;
   Scanner scanner;

   int tableMax = conn.tableOperations().getMaxIteratorPriority();
   
   scanner.addScanIterator(new IteratorSetting(tableMax++, "foo1", ".org.bar.FooIter));
   scanner.addScanIterator(new IteratorSetting(tableMax++, "foo2", ".org.bar.BarIter));

{code}

If tableMax overflows, I think the code above will throw an exception because IteratorSetting's constructor ensure the prio is positive.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Christopher Tubbs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453566#comment-13453566 ] 

Christopher Tubbs commented on ACCUMULO-759:
--------------------------------------------

I see the value in treating the Scanner as an immutable view of a dataset within client code without interference from per-table config. However, I think it would be a simple matter to subclass Scanner for this purpose. A Scanner is a scanner over a data source, it is not strictly a dataset. I believe I spoke to Adam previously about creating such an API... one where you would manipulate a Query object representing a data source, and then executing it. Perhaps that's still a reasonable option?

It still would be reasonable to have Scanner have built-in support for such things like "after all per-table iterators". Perhaps priority isn't the best way to represent it, though? Keith and I talked about possibly creating an API where iterators are constructed more like:
{code:java}
IteratorSetting a, b, c;
IteratorChain chain = new IteratorChain();
chain.insertAfter(LAST, a);
chain.insertBefore(a.getName(), b);
chain.insertAfter(b.getName(), c);
{code}

One other thing to consider is that any change might want to be consistent across all APIs... including that pertaining to per-table configuration, and in things like the tableOperations.compact() method.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Billie Rinaldi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451943#comment-13451943 ] 

Billie Rinaldi commented on ACCUMULO-759:
-----------------------------------------

IteratorSetting is also used to configure non-scan-time iterators.  It seems like the change would make the table operations methods worse in favor of making the scanner methods only slightly better (and less functional).  This seems like a documentation problem, not an API problem.

What if we left the current functionality the way it is, and added a new method for scan iterator configuration that would ignore the priority, e.g. addScanIterator(IteratorSetting si, boolean ignorePriority)?
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453309#comment-13453309 ] 

Keith Turner edited comment on ACCUMULO-759 at 9/12/12 6:21 AM:
----------------------------------------------------------------

After reading all of the comments, now I am thinking of the following.

I would avoid restricting the priority for tablet iterators to < 1024.  From an API perspective I think that cleanest option may be to set a list.  With a method like appendScanIterator() that applies we need method like clearScanIterators().  I think using a boolean or negative number is a behind the scenes implementation detail.  When the API uses a list, it does not need to expose either.

{code:java}
//the purpose of this class is to allow user to configure scan time iterators... it does not allow the user to set the priority or name.... 
class ScanIteratorSetting extends IteratorSetting {

   //has all of IteratorSetting's constructors, but priority and name are never arguments e.g.
   public ScanIteratorSetting(String iteratorClass) {
     super(-42, "does not matter", iteratorClass, new HashMap<String,String>());
   }

   public void setPriority(int priority) {
     throw new UnsupportedOperationsException();
   }

   public void setName(String name) {
     throw new UnsupportedOperationsException();
   }
}
{code}

Maybe ScanIteratorSetting should not extend IteratorSetting.  Maybe IteratorSetting and ScanIteratorSetting should have a common parent?

The scanner would keep all of its current methods and add the following method.

{code:java}

interface ScannerBase {

   /**
    * This is a legacy method, you should probably use {@link setIterators(...)} or {@link setIterator(...)} which are much 
    * simpler.  This method allows you to insert scan iterators before iterators configured for the table which is not possible
    * with {@link setIterators(...)} or {@link setIterator(...)}.
    */
   public void addScanIterator(IteratorSetting cfg);
   public void removeScanIterator(String iteratorName);
   public void updateScanIteratorOption(String iteratorName, String key, String value);

   /**
    * Scan time iterators that will execute server side in the order given in the list after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterators(List<ScanIteratorSetting> scanIterators);

   /**
    * A convenience method for setting a single scan time iterator that will execute after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterator(ScanIteratorSetting scanIterator);

}

{code}

addScanIterator() should probably throw an expception if passed a ScanIteratorSetting
                
      was (Author: kturner):
    After reading all of the comments, now I am thinking of the following.

I would avoid restricting the priority for tablet iterators to < 1024.  From an API perspective I think that cleanest option may be to set a list.  With a method like appendScanIterator() that applies we need method like clearScanIterators().  I think using a boolean or negative number is a behind the scenes implementation detail.  When the API uses a list, it does not need to expose either.

{code:java}
//the purpose of this class is to allow user to configure scan time iterators... it does not allow the user to set the priority or name.... 
class ScanIteratorSetting extends IteratorSetting {

   //has all of IteratorSetting's constructors, but priority and name are never arguments e.g.
   public ScanIteratorSetting(String iteratorClass) {
     super(-42, "does not matter", iteratorClass, new HashMap<String,String>());
   }

   public void setPriority(int priority) {
     throw new UnsupportedOperationsException();
   }

   public void setName(String name) {
     throw new UnsupportedOperationsException();
   }
}
{code}

Maybe ScanIteratorSetting should not extend IteratorSetting.  Maybe IteratorSetting and ScanIteratorSetting should have a common parent?

The scanner would keep all of its current methods and add the following method.

{code:java}

interface ScannerBase {

   public void addScanIterator(IteratorSetting cfg);
   public void removeScanIterator(String iteratorName);
   public void updateScanIteratorOption(String iteratorName, String key, String value);

   /**
    * Scan time iterators that will execute server side in the order given in the list after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterators(List<ScanIteratorSetting> scanIterators);

   /**
    * A convenience method for setting a single scan time iterator that will execute after all iterators configured for the table.
    * 
    * This method will overwrite iterators previously set by {@link setIterators(...)} or {@link setIterator(...)}
    *
    * This method will have no effect on iterators set by {@link addScanIterator(...)}
    */
   void setIterator(ScanIteratorSetting scanIterator);

}

{code}

addScanIterator() should probably throw an expception if passed a ScanIteratorSetting
                  
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452121#comment-13452121 ] 

Keith Turner commented on ACCUMULO-759:
---------------------------------------

The javadoc for IteratorSetting should point to the javadoc for getMaxIteratorPriority().   Javadoc on one of the methods should outline how to use it with an example.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453331#comment-13453331 ] 

Keith Turner commented on ACCUMULO-759:
---------------------------------------

I am thinking that instead of the two setIterator(s) methods I proposed, we could have one that takes a varargs argument. 

{code:java}
interface ScannerBase {
   void setIterators(ScanIteratorSetting  ... scanIterators);
}
{code}
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453530#comment-13453530 ] 

Keith Turner commented on ACCUMULO-759:
---------------------------------------

bq. What if we went with a more immutable form of Scanner

Christopher and I have been discussing this.   When Billie proposed the following 

{noformat}
Scanner add(ScanIteratorSetting is)
{noformat}

add() could return 'this' or a new Scanner.  I like the model where Scanner is immutable and all config changes return a new Scanner.  However, I do not think it make sense to apply this model to just one on configuration method.   That seems inconsistent to me.   If add() or transform() has this behavior, then it seems that setRange(), fetchColumns(), etc should also have this behavior.  But I do not think we could make setRange() behave this way without breaking existing code.

    
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-759) remove priority setting for scan-time iterators

Posted by "Billie Rinaldi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453348#comment-13453348 ] 

Billie Rinaldi commented on ACCUMULO-759:
-----------------------------------------

I like it.
                
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
>                 Key: ACCUMULO-759
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-759
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>              Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators arbitrarily. However that priority is an integer that doesn't directly convey the iterator's relationship to other iterators. I would postulate that nobody has ever needed to sneak in a scan-time iterator underneath a configured table iterator (please let me know if I'm wrong about this), and the effect of doing so is not easy to calculate. Many people have chosen a bad iterator priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring scan-time iterators, in which the order of the iterator tree is the same order in which the addScanIterator method is called, and all scan-time iterators apply after the configured iterators apply. The change to the API should just be to remove the priority number, and the existing IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should make it easier for developers to use iterators correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira