You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by "Keith Turner (Created) (JIRA)" <ji...@apache.org> on 2012/02/15 17:38:59 UTC

[jira] [Created] (ACCUMULO-403) Create general row selection iterator

Create general row selection iterator
-------------------------------------

                 Key: ACCUMULO-403
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
             Project: Accumulo
          Issue Type: New Feature
          Components: client, tserver
            Reporter: Keith Turner
            Assignee: Billie Rinaldi
             Fix For: 1.5.0


The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  

This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.


{noformat}

class RowSelectionIterator extends WrappingIterator {

   public abstract boolean selectRow(SortedKeyValueIterator row);

}

{noformat}


Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.


{noformat}

class FooBarRowSelector extends  RowSelectionIterator {
   public boolean selectRow(SortedKeyValueIterator row){
      
      Text row = row.getTopKey().getRow();
      //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
      //if the row only has a few columns scanning is probably faster... also seeking the 
      //columns in sorted order is more efficient.
      row.seek(Range.exact(row, 'bar');
      boolean sawBar = row.hasTop();

      row.seek(Range.exact(row, 'foo'));
      boolean sawFoo = row.hasTop();

      return sawBar && sawFoo;
   }
}

{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ACCUMULO-403) Create general row selection iterator

Posted by "Keith Turner (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ACCUMULO-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner updated ACCUMULO-403:
----------------------------------

    Fix Version/s:     (was: 1.4.1)
                   1.4.0
         Assignee: Keith Turner  (was: Billie Rinaldi)
    
> Create general row selection iterator
> -------------------------------------
>
>                 Key: ACCUMULO-403
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.4.0
>
>
> The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  
> This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.
> {noformat}
> class RowSelectionIterator extends WrappingIterator {
>    public abstract boolean selectRow(SortedKeyValueIterator row);
> }
> {noformat}
> Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.
> {noformat}
> class FooBarRowSelector extends  RowSelectionIterator {
>    public boolean selectRow(SortedKeyValueIterator row){
>       
>       Text row = row.getTopKey().getRow();
>       //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
>       //if the row only has a few columns scanning is probably faster... also seeking the 
>       //columns in sorted order is more efficient.
>       row.seek(Range.exact(row, 'bar');
>       boolean sawBar = row.hasTop();
>       if(!sawBar)
>         return false;
>       row.seek(Range.exact(row, 'foo'));
>       boolean sawFoo = row.hasTop();
>       return sawFoo;
>    }
> }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-403) Create general row selection iterator

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ACCUMULO-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227624#comment-13227624 ] 

Keith Turner commented on ACCUMULO-403:
---------------------------------------

I think having it seek if it wants different columns sounds reasonable.
                
> Create general row selection iterator
> -------------------------------------
>
>                 Key: ACCUMULO-403
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.4.0
>
>
> The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  
> This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.
> {noformat}
> class RowSelectionIterator extends WrappingIterator {
>    public abstract boolean selectRow(SortedKeyValueIterator row);
> }
> {noformat}
> Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.
> {noformat}
> class FooBarRowSelector extends  RowSelectionIterator {
>    public boolean selectRow(SortedKeyValueIterator row){
>       
>       Text row = row.getTopKey().getRow();
>       //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
>       //if the row only has a few columns scanning is probably faster... also seeking the 
>       //columns in sorted order is more efficient.
>       row.seek(Range.exact(row, 'bar');
>       boolean sawBar = row.hasTop();
>       if(!sawBar)
>         return false;
>       row.seek(Range.exact(row, 'foo'));
>       boolean sawFoo = row.hasTop();
>       return sawFoo;
>    }
> }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-403) Create general row selection iterator

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ACCUMULO-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227627#comment-13227627 ] 

Keith Turner commented on ACCUMULO-403:
---------------------------------------

I will add something to the javadoc.
                
> Create general row selection iterator
> -------------------------------------
>
>                 Key: ACCUMULO-403
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.4.0
>
>
> The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  
> This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.
> {noformat}
> class RowSelectionIterator extends WrappingIterator {
>    public abstract boolean selectRow(SortedKeyValueIterator row);
> }
> {noformat}
> Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.
> {noformat}
> class FooBarRowSelector extends  RowSelectionIterator {
>    public boolean selectRow(SortedKeyValueIterator row){
>       
>       Text row = row.getTopKey().getRow();
>       //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
>       //if the row only has a few columns scanning is probably faster... also seeking the 
>       //columns in sorted order is more efficient.
>       row.seek(Range.exact(row, 'bar');
>       boolean sawBar = row.hasTop();
>       if(!sawBar)
>         return false;
>       row.seek(Range.exact(row, 'foo'));
>       boolean sawFoo = row.hasTop();
>       return sawFoo;
>    }
> }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-403) Create general row selection iterator

Posted by "Billie Rinaldi (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ACCUMULO-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226896#comment-13226896 ] 

Billie Rinaldi commented on ACCUMULO-403:
-----------------------------------------

It seems like the decisionIterator shouldn't be restricted to the same column families that are returned by the RowFilter.  Is the suggested usage that the subclass reseek in the acceptRow method if it wants different columns?
                
> Create general row selection iterator
> -------------------------------------
>
>                 Key: ACCUMULO-403
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.4.0
>
>
> The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  
> This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.
> {noformat}
> class RowSelectionIterator extends WrappingIterator {
>    public abstract boolean selectRow(SortedKeyValueIterator row);
> }
> {noformat}
> Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.
> {noformat}
> class FooBarRowSelector extends  RowSelectionIterator {
>    public boolean selectRow(SortedKeyValueIterator row){
>       
>       Text row = row.getTopKey().getRow();
>       //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
>       //if the row only has a few columns scanning is probably faster... also seeking the 
>       //columns in sorted order is more efficient.
>       row.seek(Range.exact(row, 'bar');
>       boolean sawBar = row.hasTop();
>       if(!sawBar)
>         return false;
>       row.seek(Range.exact(row, 'foo'));
>       boolean sawFoo = row.hasTop();
>       return sawFoo;
>    }
> }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-403) Create general row selection iterator

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ACCUMULO-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208640#comment-13208640 ] 

Keith Turner commented on ACCUMULO-403:
---------------------------------------

I am thinking for consistency that it may be better to call it the RowFilteringIterator, and rename the selectRow method to filterRow or filter.  This would make it consistent w/ the FilteringIterator.

This could also be generalized to different parts of the key prefix.  For example I would like column families that meet a certain criteria.  Since row filtering is probably the most common case, it could possibly extend a more general iterator for ease of use.
                
> Create general row selection iterator
> -------------------------------------
>
>                 Key: ACCUMULO-403
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Keith Turner
>            Assignee: Billie Rinaldi
>             Fix For: 1.5.0
>
>
> The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  
> This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.
> {noformat}
> class RowSelectionIterator extends WrappingIterator {
>    public abstract boolean selectRow(SortedKeyValueIterator row);
> }
> {noformat}
> Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.
> {noformat}
> class FooBarRowSelector extends  RowSelectionIterator {
>    public boolean selectRow(SortedKeyValueIterator row){
>       
>       Text row = row.getTopKey().getRow();
>       //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
>       //if the row only has a few columns scanning is probably faster... also seeking the 
>       //columns in sorted order is more efficient.
>       row.seek(Range.exact(row, 'bar');
>       boolean sawBar = row.hasTop();
>       row.seek(Range.exact(row, 'foo'));
>       boolean sawFoo = row.hasTop();
>       return sawBar && sawFoo;
>    }
> }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (ACCUMULO-403) Create general row selection iterator

Posted by "Keith Turner (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ACCUMULO-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner resolved ACCUMULO-403.
-----------------------------------

    Resolution: Fixed
    
> Create general row selection iterator
> -------------------------------------
>
>                 Key: ACCUMULO-403
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.4.0
>
>
> The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  
> This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.
> {noformat}
> class RowSelectionIterator extends WrappingIterator {
>    public abstract boolean selectRow(SortedKeyValueIterator row);
> }
> {noformat}
> Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.
> {noformat}
> class FooBarRowSelector extends  RowSelectionIterator {
>    public boolean selectRow(SortedKeyValueIterator row){
>       
>       Text row = row.getTopKey().getRow();
>       //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
>       //if the row only has a few columns scanning is probably faster... also seeking the 
>       //columns in sorted order is more efficient.
>       row.seek(Range.exact(row, 'bar');
>       boolean sawBar = row.hasTop();
>       if(!sawBar)
>         return false;
>       row.seek(Range.exact(row, 'foo'));
>       boolean sawFoo = row.hasTop();
>       return sawFoo;
>    }
> }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ACCUMULO-403) Create general row selection iterator

Posted by "Keith Turner (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ACCUMULO-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner updated ACCUMULO-403:
----------------------------------

    Description: 
The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  

This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.


{noformat}

class RowSelectionIterator extends WrappingIterator {

   public abstract boolean selectRow(SortedKeyValueIterator row);

}

{noformat}


Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.


{noformat}

class FooBarRowSelector extends  RowSelectionIterator {
   public boolean selectRow(SortedKeyValueIterator row){
      
      Text row = row.getTopKey().getRow();
      //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
      //if the row only has a few columns scanning is probably faster... also seeking the 
      //columns in sorted order is more efficient.
      row.seek(Range.exact(row, 'bar');
      boolean sawBar = row.hasTop();

      if(!sawBar)
        return false;

      row.seek(Range.exact(row, 'foo'));
      boolean sawFoo = row.hasTop();

      return sawFoo;
   }
}

{noformat}

  was:
The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  

This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.


{noformat}

class RowSelectionIterator extends WrappingIterator {

   public abstract boolean selectRow(SortedKeyValueIterator row);

}

{noformat}


Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.


{noformat}

class FooBarRowSelector extends  RowSelectionIterator {
   public boolean selectRow(SortedKeyValueIterator row){
      
      Text row = row.getTopKey().getRow();
      //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
      //if the row only has a few columns scanning is probably faster... also seeking the 
      //columns in sorted order is more efficient.
      row.seek(Range.exact(row, 'bar');
      boolean sawBar = row.hasTop();

      row.seek(Range.exact(row, 'foo'));
      boolean sawFoo = row.hasTop();

      return sawBar && sawFoo;
   }
}

{noformat}

    
> Create general row selection iterator
> -------------------------------------
>
>                 Key: ACCUMULO-403
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Keith Turner
>            Assignee: Billie Rinaldi
>             Fix For: 1.5.0
>
>
> The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  
> This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.
> {noformat}
> class RowSelectionIterator extends WrappingIterator {
>    public abstract boolean selectRow(SortedKeyValueIterator row);
> }
> {noformat}
> Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.
> {noformat}
> class FooBarRowSelector extends  RowSelectionIterator {
>    public boolean selectRow(SortedKeyValueIterator row){
>       
>       Text row = row.getTopKey().getRow();
>       //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
>       //if the row only has a few columns scanning is probably faster... also seeking the 
>       //columns in sorted order is more efficient.
>       row.seek(Range.exact(row, 'bar');
>       boolean sawBar = row.hasTop();
>       if(!sawBar)
>         return false;
>       row.seek(Range.exact(row, 'foo'));
>       boolean sawFoo = row.hasTop();
>       return sawFoo;
>    }
> }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ACCUMULO-403) Create general row selection iterator

Posted by "Keith Turner (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ACCUMULO-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner updated ACCUMULO-403:
----------------------------------

    Fix Version/s:     (was: 1.5.0)
                   1.4.1
    
> Create general row selection iterator
> -------------------------------------
>
>                 Key: ACCUMULO-403
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-403
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Keith Turner
>            Assignee: Billie Rinaldi
>             Fix For: 1.4.1
>
>
> The WholeRowIterator support filtering rows that meet a certain criteria.  However it reads the entire row into memory.  It is possible to efficiently select rows w/o reading them into memory by using two iterators.  One iterator for selection, one for reading.  When its determined that a row is not needed using the selection iterator, then seek the read iterator over the row.  
> This pattern could be made into an easy to use iterator that users extend.  The iterator could have an abstract method that user implement to decide if they want to select or filter a row.  Could look something like the following.
> {noformat}
> class RowSelectionIterator extends WrappingIterator {
>    public abstract boolean selectRow(SortedKeyValueIterator row);
> }
> {noformat}
> Below is a simple example of a row selection iterator that returns rows that have the columns foo and bar.
> {noformat}
> class FooBarRowSelector extends  RowSelectionIterator {
>    public boolean selectRow(SortedKeyValueIterator row){
>       
>       Text row = row.getTopKey().getRow();
>       //seek instead of scanning, this more efficient for large rows w/ lots of columns... 
>       //if the row only has a few columns scanning is probably faster... also seeking the 
>       //columns in sorted order is more efficient.
>       row.seek(Range.exact(row, 'bar');
>       boolean sawBar = row.hasTop();
>       if(!sawBar)
>         return false;
>       row.seek(Range.exact(row, 'foo'));
>       boolean sawFoo = row.hasTop();
>       return sawFoo;
>    }
> }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira