You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Keith Turner (JIRA)" <ji...@apache.org> on 2015/03/10 15:51:38 UTC
[jira] [Commented] (ACCUMULO-3633) Please provide information on implementing custom iterators in the documentation

    [ https://issues.apache.org/jira/browse/ACCUMULO-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354999#comment-14354999 ] 

Keith Turner commented on ACCUMULO-3633:
----------------------------------------

[~elserj] some thoughts as I read over it.   If you make a 2nd revision, would you mind putting it on RB?  

 * init section : iterator config can come from other places than table config
 * Instantiation section : Could also mention per table classpath.  Also could mention that one should not seek source in init()
 * Iterator design section : List is mentioned, could also mention Tree of iterators.  Also could use Tree terminology in deepCopy section.  Could mention deepCopy() should be called  in init()
 * explain isolation, data sources do not change until top iterator returns  key/value, could move [html isolation iterater documation|https://github.com/apache/accumulo/blob/1.6.2/docs/src/main/resources/isolation.html] to this new section in user manual
 * in addition to combiner and filter, could mention transforming iterator
 * next section : does it have to be a cached key value?
 * could explain why cross row operations are not recommended
 * hasTop section : java iterators have nothing similar to hasTop

Some code like the following showing how tserver will call iterators for a scan may be useful.

{code:java}


 List<KeyValue> batch;
 Range range = //range from client
 while(!overSizeLimit(batch)){
   source = systemIterator()
   for(SKVI iter : iterators){
    iter.init(source, opts, env)
    source = iter  
   }

   //read a batch of data to return to client
   topIter = iterators.last()
   topIter.seek(range, ...)

   while(topIter.hasTop() && !overSizeLimit(batch)){
       key = topIter.getTopKey()
       val = topIter.getTopValue()
       batch.add(new KeyValue(key, val)
       if(systemDataSourcesChanged()){
         //code does not show isolation case, which will keep using same data sources until a row boundry is hit 
         range = new Range(key, false, range.endKey(), range.endKeyInclusive());
         break;
       }
   }
 }
 //return batch of key values to client
{code}

> Please provide information on implementing custom iterators in the documentation
> --------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-3633
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3633
>             Project: Accumulo
>          Issue Type: Wish
>          Components: docs
>    Affects Versions: 1.6.0
>         Environment: Centos 6.5, Accumulo 1.6.0, CDH 5. 
>            Reporter: Vaibhav Thapliyal
>            Assignee: Josh Elser
>              Labels: documentation
>             Fix For: 1.7.0
>
>         Attachments: 0001-ACCUMULO-3633-User-manual-chapter-on-custom-iterator.patch
>
>
> Dear all,
> Can you please provide a documentation regarding creating custom Iterators. For example, explain the functionality  of the functions inside SortedKeyValueIterator and how to override those functions.
> Please explain how these functions are executed (which class calls these functions when the iterator executes).
> I would appreciate if these changes are made in your future documentations as this would help developers who are new to accumulo to quickly get started on writing their own custom iterators which is an essential part of accumulo. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)