You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Keith Turner <ke...@deenlo.com> on 2017/01/04 15:33:49 UTC

Re: Teardown and deepCopy

Josh,

Deepcopy is not called when an iterator is torn down.  It has an
entirely different use. Deepcopy allows cloning of an iterator during
init().  The clones allow you to have multiple pointers into a tablets
data which allows things like server side joins.

Keith

On Wed, Dec 28, 2016 at 12:50 PM, Josh Clum <jo...@gmail.com> wrote:
> Hi,
>
> I have a question about iterator teardown. It seems from
> https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/iterator_design.txt#L383-L390
> that deepCopy should be called when an iterator is torn down. I'm not seeing
> that behavior. Below is a test that sets table.scan.max.memory to 1 which
> should force a tear down for each kv returned. I should see deepCopy being
> called 3 times but when I tail the Tserver logs I'm not seeing it being
> called. Below is the test and the Tserver output.
>
> What am I missing here?
>
> Josh
>
> ➜  tail -f -n200 ...../accumulo/logs/TabletServer_*.out | grep MyIterator
> MyIterator: init
> MyIterator: seek
> MyIterator: hasTop
> MyIterator: getTopKey
> MyIterator: getTopValue
> MyIterator: init
> MyIterator: seek
> MyIterator: hasTop
> MyIterator: getTopKey
> MyIterator: getTopValue
> MyIterator: init
> MyIterator: seek
> MyIterator: hasTop
> MyIterator: getTopKey
> MyIterator: getTopValue
> MyIterator: init
> MyIterator: seek
> MyIterator: hasTop
>
> public static class MyIterator implements SortedKeyValueIterator<Key, Value>
> {
>
>     private SortedKeyValueIterator<Key, Value> source;
>
>     public MyIterator() { }
>
>     @Override
>     public void init(SortedKeyValueIterator<Key, Value> source,
>                      Map<String, String> options,
>                      IteratorEnvironment env) throws IOException {
>         System.out.println("MyIterator: init");
>         this.source = source;
>     }
>
>     @Override
>     public boolean hasTop() {
>         System.out.println("MyIterator: hasTop");
>         return source.hasTop();
>     }
>
>     @Override
>     public void next() throws IOException {
>         System.out.println("MyIterator: next");
>         source.next();
>     }
>
>     @Override
>     public void seek(Range range, Collection<ByteSequence> columnFamilies,
> boolean inclusive) throws IOException {
>         System.out.println("MyIterator: seek");
>         source.seek(range, columnFamilies, inclusive);
>     }
>
>     @Override
>     public Key getTopKey() {
>         System.out.println("MyIterator: getTopKey");
>         return source.getTopKey();
>     }
>
>     @Override
>     public Value getTopValue() {
>         System.out.println("MyIterator: getTopValue");
>         return source.getTopValue();
>     }
>
>     @Override
>     public SortedKeyValueIterator<Key, Value> deepCopy(IteratorEnvironment
> env) {
>         System.out.println("MyIterator: deepCopy");
>         return source.deepCopy(env);
>     }
> }
>
> @Test
> public void testTearDown() throws Exception {
>     String table = "test";
>     Connector conn = cluster.getConnector("root", "secret");
>     conn.tableOperations().create(table);
>     conn.tableOperations().attachIterator(table, new IteratorSetting(25,
> MyIterator.class));
>     conn.tableOperations().setProperty(table, "table.scan.max.memory", "1");
>
>     BatchWriter writer = conn.createBatchWriter(table, new
> BatchWriterConfig());
>
>     Mutation m1 = new Mutation("row");
>     m1.put("f1", "q1", 1, "val1");
>     writer.addMutation(m1);
>
>     Mutation m2 = new Mutation("row");
>     m2.put("f2", "q2", 1, "val2");
>     writer.addMutation(m2);
>
>     Mutation m3 = new Mutation("row");
>     m3.put("f3", "q3", 1, "val3");
>     writer.addMutation(m3);
>
>     writer.flush();
>     writer.close();
>
>     BatchScanner scanner = conn.createBatchScanner(table, new
> Authorizations(), 3);
>     scanner.setRanges(Collections.singletonList(new Range()));
>     for(Map.Entry<Key, Value> entry : scanner) {
>         System.out.println(entry.getKey() + " : " + entry.getValue());
>     }
>     System.out.println("Results complete!");
> }

Re: Teardown and deepCopy

Posted by Dylan Hutchison <dh...@cs.washington.edu>.
During a batch scan, many tablets are scanned in parallel.  If I understand
your scenario correctly, each tablet scan will build a set of column IDs
seen so far, so that each scan can skip IDs that the scan has already seen
rather than re-transmit them.  The goal is to find the unique column IDs
across the whole scan.

In this case, when an iterator is torn down, it drops its set of already
seen IDs and starts from scratch.

This sounds fine, as long as you have the ability to do final
de-duplication at the client.  The same ID might be retrieved from
different tablets.  Check to see if this meets your performance
requirements.

If you need to retrieve the unique column IDs faster, you might consider
storing them in a secondary index table where the column IDs are placed in
the row.  Scanning unique IDs from the row is easy because they are sorted.

On Wed, Jan 4, 2017 at 8:42 AM, Roshan Punnoose <ro...@gmail.com> wrote:

> I have a tablet with an unsorted list of IDs in the Column Qualifier,
> these IDs can repeat sporadically. So I was hoping to keep a set of these
> IDs around in memory to check if I have seen an ID or not. There is some
> other logic to ensure that the set does not grow unbounded, but just trying
> to figure out if I can keep this ID set around. With the teardown, even
> though I know which was the last Key to return from the new seek Range, I
> don't know if I have seen the upcoming IDs. Not sure if that makes sense...
>
> Was thinking that on teardown, we could use either the deepCopy or init
> method to rollover state from the torn down iterator to the new iterator.
>
> On Wed, Jan 4, 2017 at 11:14 AM Keith Turner <ke...@deenlo.com> wrote:
>
>> On Wed, Jan 4, 2017 at 10:44 AM, Roshan Punnoose <ro...@gmail.com>
>> wrote:
>> > Keith,
>> >
>> > If an iterator has state that it is maintaining, what is the best way to
>> > transfer that state to the new iterator after a tear down?  For example,
>> > MyIterator might have a Boolean flag of some sort. After tear down, is
>> there
>> > a way to copy that state to the new iterator before it starts seeking
>> again?
>>
>> There is nothing currently built in to help with this.
>>
>> What are you trying to accomplish?  Are you interested in maintaining
>> this state for a scan or batch scan?
>>
>>
>> >
>> > Roshan
>> >
>> > On Wed, Jan 4, 2017 at 10:33 AM Keith Turner <ke...@deenlo.com> wrote:
>> >>
>> >> Josh,
>> >>
>> >> Deepcopy is not called when an iterator is torn down.  It has an
>> >> entirely different use. Deepcopy allows cloning of an iterator during
>> >> init().  The clones allow you to have multiple pointers into a tablets
>> >> data which allows things like server side joins.
>> >>
>> >> Keith
>> >>
>> >> On Wed, Dec 28, 2016 at 12:50 PM, Josh Clum <jo...@gmail.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > I have a question about iterator teardown. It seems from
>> >> >
>> >> > https://github.com/apache/accumulo/blob/master/docs/src/
>> main/asciidoc/chapters/iterator_design.txt#L383-L390
>> >> > that deepCopy should be called when an iterator is torn down. I'm not
>> >> > seeing
>> >> > that behavior. Below is a test that sets table.scan.max.memory to 1
>> >> > which
>> >> > should force a tear down for each kv returned. I should see deepCopy
>> >> > being
>> >> > called 3 times but when I tail the Tserver logs I'm not seeing it
>> being
>> >> > called. Below is the test and the Tserver output.
>> >> >
>> >> > What am I missing here?
>> >> >
>> >> > Josh
>> >> >
>> >> > ➜  tail -f -n200 ...../accumulo/logs/TabletServer_*.out | grep
>> >> > MyIterator
>> >> > MyIterator: init
>> >> > MyIterator: seek
>> >> > MyIterator: hasTop
>> >> > MyIterator: getTopKey
>> >> > MyIterator: getTopValue
>> >> > MyIterator: init
>> >> > MyIterator: seek
>> >> > MyIterator: hasTop
>> >> > MyIterator: getTopKey
>> >> > MyIterator: getTopValue
>> >> > MyIterator: init
>> >> > MyIterator: seek
>> >> > MyIterator: hasTop
>> >> > MyIterator: getTopKey
>> >> > MyIterator: getTopValue
>> >> > MyIterator: init
>> >> > MyIterator: seek
>> >> > MyIterator: hasTop
>> >> >
>> >> > public static class MyIterator implements SortedKeyValueIterator<Key,
>> >> > Value>
>> >> > {
>> >> >
>> >> >     private SortedKeyValueIterator<Key, Value> source;
>> >> >
>> >> >     public MyIterator() { }
>> >> >
>> >> >     @Override
>> >> >     public void init(SortedKeyValueIterator<Key, Value> source,
>> >> >                      Map<String, String> options,
>> >> >                      IteratorEnvironment env) throws IOException {
>> >> >         System.out.println("MyIterator: init");
>> >> >         this.source = source;
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public boolean hasTop() {
>> >> >         System.out.println("MyIterator: hasTop");
>> >> >         return source.hasTop();
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public void next() throws IOException {
>> >> >         System.out.println("MyIterator: next");
>> >> >         source.next();
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public void seek(Range range, Collection<ByteSequence>
>> >> > columnFamilies,
>> >> > boolean inclusive) throws IOException {
>> >> >         System.out.println("MyIterator: seek");
>> >> >         source.seek(range, columnFamilies, inclusive);
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public Key getTopKey() {
>> >> >         System.out.println("MyIterator: getTopKey");
>> >> >         return source.getTopKey();
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public Value getTopValue() {
>> >> >         System.out.println("MyIterator: getTopValue");
>> >> >         return source.getTopValue();
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public SortedKeyValueIterator<Key, Value>
>> >> > deepCopy(IteratorEnvironment
>> >> > env) {
>> >> >         System.out.println("MyIterator: deepCopy");
>> >> >         return source.deepCopy(env);
>> >> >     }
>> >> > }
>> >> >
>> >> > @Test
>> >> > public void testTearDown() throws Exception {
>> >> >     String table = "test";
>> >> >     Connector conn = cluster.getConnector("root", "secret");
>> >> >     conn.tableOperations().create(table);
>> >> >     conn.tableOperations().attachIterator(table, new
>> IteratorSetting(25,
>> >> > MyIterator.class));
>> >> >     conn.tableOperations().setProperty(table,
>> "table.scan.max.memory",
>> >> > "1");
>> >> >
>> >> >     BatchWriter writer = conn.createBatchWriter(table, new
>> >> > BatchWriterConfig());
>> >> >
>> >> >     Mutation m1 = new Mutation("row");
>> >> >     m1.put("f1", "q1", 1, "val1");
>> >> >     writer.addMutation(m1);
>> >> >
>> >> >     Mutation m2 = new Mutation("row");
>> >> >     m2.put("f2", "q2", 1, "val2");
>> >> >     writer.addMutation(m2);
>> >> >
>> >> >     Mutation m3 = new Mutation("row");
>> >> >     m3.put("f3", "q3", 1, "val3");
>> >> >     writer.addMutation(m3);
>> >> >
>> >> >     writer.flush();
>> >> >     writer.close();
>> >> >
>> >> >     BatchScanner scanner = conn.createBatchScanner(table, new
>> >> > Authorizations(), 3);
>> >> >     scanner.setRanges(Collections.singletonList(new Range()));
>> >> >     for(Map.Entry<Key, Value> entry : scanner) {
>> >> >         System.out.println(entry.getKey() + " : " +
>> entry.getValue());
>> >> >     }
>> >> >     System.out.println("Results complete!");
>> >> > }
>>
>

Re: Teardown and deepCopy

Posted by Josh Elser <jo...@gmail.com>.
I would suggest that your approach is flawed from the start. Consider 
the following case:

You read through the first half of a tablet and have collected a set of 
1000 IDs which you have seen. When you try to read the second half of 
the tablet, the TabletServer dies from an OOME. The Tablet is moved to a 
different TabletServer, starts reading the second half of the Tablet, 
but cannot know any of those 1000 IDs that you had collected in memory 
on the other TabletServer.

Iterators are *not* designed to be stateful. Pretty much any attempt you 
do to try to force them to be stateful will have some sort of inherent 
flaw. If you need to maintain state, you have two options:

1. Do it outside of Accumulo -- in the client or some other execution 
framework (e.g. YARN, Spark, Fluo, etc). There are many options, which 
one you should use likely depends on your application.
2. Create a table schema in which all of the elements you need to 
read/act on exist in one row. A row is the finest level of atomicity 
that Accumulo provides. This depends a bit on what the actual problem is.

Roshan Punnoose wrote:
> I have a tablet with an unsorted list of IDs in the Column Qualifier,
> these IDs can repeat sporadically. So I was hoping to keep a set of
> these IDs around in memory to check if I have seen an ID or not. There
> is some other logic to ensure that the set does not grow unbounded, but
> just trying to figure out if I can keep this ID set around. With the
> teardown, even though I know which was the last Key to return from the
> new seek Range, I don't know if I have seen the upcoming IDs. Not sure
> if that makes sense...
>
> Was thinking that on teardown, we could use either the deepCopy or init
> method to rollover state from the torn down iterator to the new iterator.
>
> On Wed, Jan 4, 2017 at 11:14 AM Keith Turner <keith@deenlo.com
> <ma...@deenlo.com>> wrote:
>
>     On Wed, Jan 4, 2017 at 10:44 AM, Roshan Punnoose <roshanp@gmail.com
>     <ma...@gmail.com>> wrote:
>      > Keith,
>      >
>      > If an iterator has state that it is maintaining, what is the best
>     way to
>      > transfer that state to the new iterator after a tear down?  For
>     example,
>      > MyIterator might have a Boolean flag of some sort. After tear
>     down, is there
>      > a way to copy that state to the new iterator before it starts
>     seeking again?
>
>     There is nothing currently built in to help with this.
>
>     What are you trying to accomplish?  Are you interested in maintaining
>     this state for a scan or batch scan?
>
>
>      >
>      > Roshan
>      >
>      > On Wed, Jan 4, 2017 at 10:33 AM Keith Turner <keith@deenlo.com
>     <ma...@deenlo.com>> wrote:
>      >>
>      >> Josh,
>      >>
>      >> Deepcopy is not called when an iterator is torn down.  It has an
>      >> entirely different use. Deepcopy allows cloning of an iterator
>     during
>      >> init().  The clones allow you to have multiple pointers into a
>     tablets
>      >> data which allows things like server side joins.
>      >>
>      >> Keith
>      >>
>      >> On Wed, Dec 28, 2016 at 12:50 PM, Josh Clum <joshclum@gmail.com
>     <ma...@gmail.com>> wrote:
>      >> > Hi,
>      >> >
>      >> > I have a question about iterator teardown. It seems from
>      >> >
>      >> >
>     https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/iterator_design.txt#L383-L390
>      >> > that deepCopy should be called when an iterator is torn down.
>     I'm not
>      >> > seeing
>      >> > that behavior. Below is a test that sets table.scan.max.memory
>     to 1
>      >> > which
>      >> > should force a tear down for each kv returned. I should see
>     deepCopy
>      >> > being
>      >> > called 3 times but when I tail the Tserver logs I'm not seeing
>     it being
>      >> > called. Below is the test and the Tserver output.
>      >> >
>      >> > What am I missing here?
>      >> >
>      >> > Josh
>      >> >
>      >> > \u279c  tail -f -n200 ...../accumulo/logs/TabletServer_*.out | grep
>      >> > MyIterator
>      >> > MyIterator: init
>      >> > MyIterator: seek
>      >> > MyIterator: hasTop
>      >> > MyIterator: getTopKey
>      >> > MyIterator: getTopValue
>      >> > MyIterator: init
>      >> > MyIterator: seek
>      >> > MyIterator: hasTop
>      >> > MyIterator: getTopKey
>      >> > MyIterator: getTopValue
>      >> > MyIterator: init
>      >> > MyIterator: seek
>      >> > MyIterator: hasTop
>      >> > MyIterator: getTopKey
>      >> > MyIterator: getTopValue
>      >> > MyIterator: init
>      >> > MyIterator: seek
>      >> > MyIterator: hasTop
>      >> >
>      >> > public static class MyIterator implements
>     SortedKeyValueIterator<Key,
>      >> > Value>
>      >> > {
>      >> >
>      >> >     private SortedKeyValueIterator<Key, Value> source;
>      >> >
>      >> >     public MyIterator() { }
>      >> >
>      >> >     @Override
>      >> >     public void init(SortedKeyValueIterator<Key, Value> source,
>      >> >                      Map<String, String> options,
>      >> >                      IteratorEnvironment env) throws IOException {
>      >> >         System.out.println("MyIterator: init");
>      >> >         this.source = source;
>      >> >     }
>      >> >
>      >> >     @Override
>      >> >     public boolean hasTop() {
>      >> >         System.out.println("MyIterator: hasTop");
>      >> >         return source.hasTop();
>      >> >     }
>      >> >
>      >> >     @Override
>      >> >     public void next() throws IOException {
>      >> >         System.out.println("MyIterator: next");
>      >> >         source.next();
>      >> >     }
>      >> >
>      >> >     @Override
>      >> >     public void seek(Range range, Collection<ByteSequence>
>      >> > columnFamilies,
>      >> > boolean inclusive) throws IOException {
>      >> >         System.out.println("MyIterator: seek");
>      >> >         source.seek(range, columnFamilies, inclusive);
>      >> >     }
>      >> >
>      >> >     @Override
>      >> >     public Key getTopKey() {
>      >> >         System.out.println("MyIterator: getTopKey");
>      >> >         return source.getTopKey();
>      >> >     }
>      >> >
>      >> >     @Override
>      >> >     public Value getTopValue() {
>      >> >         System.out.println("MyIterator: getTopValue");
>      >> >         return source.getTopValue();
>      >> >     }
>      >> >
>      >> >     @Override
>      >> >     public SortedKeyValueIterator<Key, Value>
>      >> > deepCopy(IteratorEnvironment
>      >> > env) {
>      >> >         System.out.println("MyIterator: deepCopy");
>      >> >         return source.deepCopy(env);
>      >> >     }
>      >> > }
>      >> >
>      >> > @Test
>      >> > public void testTearDown() throws Exception {
>      >> >     String table = "test";
>      >> >     Connector conn = cluster.getConnector("root", "secret");
>      >> >     conn.tableOperations().create(table);
>      >> >     conn.tableOperations().attachIterator(table, new
>     IteratorSetting(25,
>      >> > MyIterator.class));
>      >> >     conn.tableOperations().setProperty(table,
>     "table.scan.max.memory",
>      >> > "1");
>      >> >
>      >> >     BatchWriter writer = conn.createBatchWriter(table, new
>      >> > BatchWriterConfig());
>      >> >
>      >> >     Mutation m1 = new Mutation("row");
>      >> >     m1.put("f1", "q1", 1, "val1");
>      >> >     writer.addMutation(m1);
>      >> >
>      >> >     Mutation m2 = new Mutation("row");
>      >> >     m2.put("f2", "q2", 1, "val2");
>      >> >     writer.addMutation(m2);
>      >> >
>      >> >     Mutation m3 = new Mutation("row");
>      >> >     m3.put("f3", "q3", 1, "val3");
>      >> >     writer.addMutation(m3);
>      >> >
>      >> >     writer.flush();
>      >> >     writer.close();
>      >> >
>      >> >     BatchScanner scanner = conn.createBatchScanner(table, new
>      >> > Authorizations(), 3);
>      >> >     scanner.setRanges(Collections.singletonList(new Range()));
>      >> >     for(Map.Entry<Key, Value> entry : scanner) {
>      >> >         System.out.println(entry.getKey() + " : " +
>     entry.getValue());
>      >> >     }
>      >> >     System.out.println("Results complete!");
>      >> > }
>

Re: Teardown and deepCopy

Posted by Keith Turner <ke...@deenlo.com>.
On Wed, Jan 4, 2017 at 11:57 AM, Roshan Punnoose <ro...@gmail.com> wrote:
> Keith, just would like to ignore it. Basically just doing a distinct
> operation on the column qualifiers.
>
> Dylan, the hard part is that we are trying not to constitute the results on
> the client side completely to do the distinct on the client side. This piece
> is just a smaller piece of a larger query.
>
> Thanks guys for the help. I feel like I'm trying to do something way out of
> bounds of what Accumulo is really built to do. Just testing the bounds :)

Good luck.   As Dylan suggested, partial deduplication in the tserver
for batches and complete deduplication on the client side is a good
option.  Too bad that does not work for you.  You can configure the
scanner to fetch larger batches, which could result in more
deduplication on the server side.

Could possibly do something like the following.

  Scan X keys on client side keeping track of seen IDs
  while(true){
    Start a new scan of X keys with last key from prev scan AND pass
seen IDs as config to a scan iterator... keep track of seen IDs
  }

>
> Roshan
>
> On Wed, Jan 4, 2017 at 11:54 AM Keith Turner <ke...@deenlo.com> wrote:
>>
>> On Wed, Jan 4, 2017 at 11:42 AM, Roshan Punnoose <ro...@gmail.com>
>> wrote:
>> > I have a tablet with an unsorted list of IDs in the Column Qualifier,
>> > these
>> > IDs can repeat sporadically. So I was hoping to keep a set of these IDs
>> > around in memory to check if I have seen an ID or not. There is some
>> > other
>>
>> When you see an ID again, what action do you want to take?
>>
>> > logic to ensure that the set does not grow unbounded, but just trying to
>> > figure out if I can keep this ID set around. With the teardown, even
>> > though
>> > I know which was the last Key to return from the new seek Range, I don't
>> > know if I have seen the upcoming IDs. Not sure if that makes sense...
>> >
>> > Was thinking that on teardown, we could use either the deepCopy or init
>> > method to rollover state from the torn down iterator to the new
>> > iterator.
>> >
>> > On Wed, Jan 4, 2017 at 11:14 AM Keith Turner <ke...@deenlo.com> wrote:
>> >>
>> >> On Wed, Jan 4, 2017 at 10:44 AM, Roshan Punnoose <ro...@gmail.com>
>> >> wrote:
>> >> > Keith,
>> >> >
>> >> > If an iterator has state that it is maintaining, what is the best way
>> >> > to
>> >> > transfer that state to the new iterator after a tear down?  For
>> >> > example,
>> >> > MyIterator might have a Boolean flag of some sort. After tear down,
>> >> > is
>> >> > there
>> >> > a way to copy that state to the new iterator before it starts seeking
>> >> > again?
>> >>
>> >> There is nothing currently built in to help with this.
>> >>
>> >> What are you trying to accomplish?  Are you interested in maintaining
>> >> this state for a scan or batch scan?
>> >>
>> >>
>> >> >
>> >> > Roshan
>> >> >
>> >> > On Wed, Jan 4, 2017 at 10:33 AM Keith Turner <ke...@deenlo.com>
>> >> > wrote:
>> >> >>
>> >> >> Josh,
>> >> >>
>> >> >> Deepcopy is not called when an iterator is torn down.  It has an
>> >> >> entirely different use. Deepcopy allows cloning of an iterator
>> >> >> during
>> >> >> init().  The clones allow you to have multiple pointers into a
>> >> >> tablets
>> >> >> data which allows things like server side joins.
>> >> >>
>> >> >> Keith
>> >> >>
>> >> >> On Wed, Dec 28, 2016 at 12:50 PM, Josh Clum <jo...@gmail.com>
>> >> >> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > I have a question about iterator teardown. It seems from
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/iterator_design.txt#L383-L390
>> >> >> > that deepCopy should be called when an iterator is torn down. I'm
>> >> >> > not
>> >> >> > seeing
>> >> >> > that behavior. Below is a test that sets table.scan.max.memory to
>> >> >> > 1
>> >> >> > which
>> >> >> > should force a tear down for each kv returned. I should see
>> >> >> > deepCopy
>> >> >> > being
>> >> >> > called 3 times but when I tail the Tserver logs I'm not seeing it
>> >> >> > being
>> >> >> > called. Below is the test and the Tserver output.
>> >> >> >
>> >> >> > What am I missing here?
>> >> >> >
>> >> >> > Josh
>> >> >> >
>> >> >> > ➜  tail -f -n200 ...../accumulo/logs/TabletServer_*.out | grep
>> >> >> > MyIterator
>> >> >> > MyIterator: init
>> >> >> > MyIterator: seek
>> >> >> > MyIterator: hasTop
>> >> >> > MyIterator: getTopKey
>> >> >> > MyIterator: getTopValue
>> >> >> > MyIterator: init
>> >> >> > MyIterator: seek
>> >> >> > MyIterator: hasTop
>> >> >> > MyIterator: getTopKey
>> >> >> > MyIterator: getTopValue
>> >> >> > MyIterator: init
>> >> >> > MyIterator: seek
>> >> >> > MyIterator: hasTop
>> >> >> > MyIterator: getTopKey
>> >> >> > MyIterator: getTopValue
>> >> >> > MyIterator: init
>> >> >> > MyIterator: seek
>> >> >> > MyIterator: hasTop
>> >> >> >
>> >> >> > public static class MyIterator implements
>> >> >> > SortedKeyValueIterator<Key,
>> >> >> > Value>
>> >> >> > {
>> >> >> >
>> >> >> >     private SortedKeyValueIterator<Key, Value> source;
>> >> >> >
>> >> >> >     public MyIterator() { }
>> >> >> >
>> >> >> >     @Override
>> >> >> >     public void init(SortedKeyValueIterator<Key, Value> source,
>> >> >> >                      Map<String, String> options,
>> >> >> >                      IteratorEnvironment env) throws IOException {
>> >> >> >         System.out.println("MyIterator: init");
>> >> >> >         this.source = source;
>> >> >> >     }
>> >> >> >
>> >> >> >     @Override
>> >> >> >     public boolean hasTop() {
>> >> >> >         System.out.println("MyIterator: hasTop");
>> >> >> >         return source.hasTop();
>> >> >> >     }
>> >> >> >
>> >> >> >     @Override
>> >> >> >     public void next() throws IOException {
>> >> >> >         System.out.println("MyIterator: next");
>> >> >> >         source.next();
>> >> >> >     }
>> >> >> >
>> >> >> >     @Override
>> >> >> >     public void seek(Range range, Collection<ByteSequence>
>> >> >> > columnFamilies,
>> >> >> > boolean inclusive) throws IOException {
>> >> >> >         System.out.println("MyIterator: seek");
>> >> >> >         source.seek(range, columnFamilies, inclusive);
>> >> >> >     }
>> >> >> >
>> >> >> >     @Override
>> >> >> >     public Key getTopKey() {
>> >> >> >         System.out.println("MyIterator: getTopKey");
>> >> >> >         return source.getTopKey();
>> >> >> >     }
>> >> >> >
>> >> >> >     @Override
>> >> >> >     public Value getTopValue() {
>> >> >> >         System.out.println("MyIterator: getTopValue");
>> >> >> >         return source.getTopValue();
>> >> >> >     }
>> >> >> >
>> >> >> >     @Override
>> >> >> >     public SortedKeyValueIterator<Key, Value>
>> >> >> > deepCopy(IteratorEnvironment
>> >> >> > env) {
>> >> >> >         System.out.println("MyIterator: deepCopy");
>> >> >> >         return source.deepCopy(env);
>> >> >> >     }
>> >> >> > }
>> >> >> >
>> >> >> > @Test
>> >> >> > public void testTearDown() throws Exception {
>> >> >> >     String table = "test";
>> >> >> >     Connector conn = cluster.getConnector("root", "secret");
>> >> >> >     conn.tableOperations().create(table);
>> >> >> >     conn.tableOperations().attachIterator(table, new
>> >> >> > IteratorSetting(25,
>> >> >> > MyIterator.class));
>> >> >> >     conn.tableOperations().setProperty(table,
>> >> >> > "table.scan.max.memory",
>> >> >> > "1");
>> >> >> >
>> >> >> >     BatchWriter writer = conn.createBatchWriter(table, new
>> >> >> > BatchWriterConfig());
>> >> >> >
>> >> >> >     Mutation m1 = new Mutation("row");
>> >> >> >     m1.put("f1", "q1", 1, "val1");
>> >> >> >     writer.addMutation(m1);
>> >> >> >
>> >> >> >     Mutation m2 = new Mutation("row");
>> >> >> >     m2.put("f2", "q2", 1, "val2");
>> >> >> >     writer.addMutation(m2);
>> >> >> >
>> >> >> >     Mutation m3 = new Mutation("row");
>> >> >> >     m3.put("f3", "q3", 1, "val3");
>> >> >> >     writer.addMutation(m3);
>> >> >> >
>> >> >> >     writer.flush();
>> >> >> >     writer.close();
>> >> >> >
>> >> >> >     BatchScanner scanner = conn.createBatchScanner(table, new
>> >> >> > Authorizations(), 3);
>> >> >> >     scanner.setRanges(Collections.singletonList(new Range()));
>> >> >> >     for(Map.Entry<Key, Value> entry : scanner) {
>> >> >> >         System.out.println(entry.getKey() + " : " +
>> >> >> > entry.getValue());
>> >> >> >     }
>> >> >> >     System.out.println("Results complete!");
>> >> >> > }

Re: Teardown and deepCopy

Posted by Roshan Punnoose <ro...@gmail.com>.
Keith, just would like to ignore it. Basically just doing a distinct
operation on the column qualifiers.

Dylan, the hard part is that we are trying not to constitute the results on
the client side completely to do the distinct on the client side. This
piece is just a smaller piece of a larger query.

Thanks guys for the help. I feel like I'm trying to do something way out of
bounds of what Accumulo is really built to do. Just testing the bounds :)

Roshan

On Wed, Jan 4, 2017 at 11:54 AM Keith Turner <ke...@deenlo.com> wrote:

> On Wed, Jan 4, 2017 at 11:42 AM, Roshan Punnoose <ro...@gmail.com>
> wrote:
> > I have a tablet with an unsorted list of IDs in the Column Qualifier,
> these
> > IDs can repeat sporadically. So I was hoping to keep a set of these IDs
> > around in memory to check if I have seen an ID or not. There is some
> other
>
> When you see an ID again, what action do you want to take?
>
> > logic to ensure that the set does not grow unbounded, but just trying to
> > figure out if I can keep this ID set around. With the teardown, even
> though
> > I know which was the last Key to return from the new seek Range, I don't
> > know if I have seen the upcoming IDs. Not sure if that makes sense...
> >
> > Was thinking that on teardown, we could use either the deepCopy or init
> > method to rollover state from the torn down iterator to the new iterator.
> >
> > On Wed, Jan 4, 2017 at 11:14 AM Keith Turner <ke...@deenlo.com> wrote:
> >>
> >> On Wed, Jan 4, 2017 at 10:44 AM, Roshan Punnoose <ro...@gmail.com>
> >> wrote:
> >> > Keith,
> >> >
> >> > If an iterator has state that it is maintaining, what is the best way
> to
> >> > transfer that state to the new iterator after a tear down?  For
> example,
> >> > MyIterator might have a Boolean flag of some sort. After tear down, is
> >> > there
> >> > a way to copy that state to the new iterator before it starts seeking
> >> > again?
> >>
> >> There is nothing currently built in to help with this.
> >>
> >> What are you trying to accomplish?  Are you interested in maintaining
> >> this state for a scan or batch scan?
> >>
> >>
> >> >
> >> > Roshan
> >> >
> >> > On Wed, Jan 4, 2017 at 10:33 AM Keith Turner <ke...@deenlo.com>
> wrote:
> >> >>
> >> >> Josh,
> >> >>
> >> >> Deepcopy is not called when an iterator is torn down.  It has an
> >> >> entirely different use. Deepcopy allows cloning of an iterator during
> >> >> init().  The clones allow you to have multiple pointers into a
> tablets
> >> >> data which allows things like server side joins.
> >> >>
> >> >> Keith
> >> >>
> >> >> On Wed, Dec 28, 2016 at 12:50 PM, Josh Clum <jo...@gmail.com>
> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I have a question about iterator teardown. It seems from
> >> >> >
> >> >> >
> >> >> >
> https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/iterator_design.txt#L383-L390
> >> >> > that deepCopy should be called when an iterator is torn down. I'm
> not
> >> >> > seeing
> >> >> > that behavior. Below is a test that sets table.scan.max.memory to 1
> >> >> > which
> >> >> > should force a tear down for each kv returned. I should see
> deepCopy
> >> >> > being
> >> >> > called 3 times but when I tail the Tserver logs I'm not seeing it
> >> >> > being
> >> >> > called. Below is the test and the Tserver output.
> >> >> >
> >> >> > What am I missing here?
> >> >> >
> >> >> > Josh
> >> >> >
> >> >> > ➜  tail -f -n200 ...../accumulo/logs/TabletServer_*.out | grep
> >> >> > MyIterator
> >> >> > MyIterator: init
> >> >> > MyIterator: seek
> >> >> > MyIterator: hasTop
> >> >> > MyIterator: getTopKey
> >> >> > MyIterator: getTopValue
> >> >> > MyIterator: init
> >> >> > MyIterator: seek
> >> >> > MyIterator: hasTop
> >> >> > MyIterator: getTopKey
> >> >> > MyIterator: getTopValue
> >> >> > MyIterator: init
> >> >> > MyIterator: seek
> >> >> > MyIterator: hasTop
> >> >> > MyIterator: getTopKey
> >> >> > MyIterator: getTopValue
> >> >> > MyIterator: init
> >> >> > MyIterator: seek
> >> >> > MyIterator: hasTop
> >> >> >
> >> >> > public static class MyIterator implements
> SortedKeyValueIterator<Key,
> >> >> > Value>
> >> >> > {
> >> >> >
> >> >> >     private SortedKeyValueIterator<Key, Value> source;
> >> >> >
> >> >> >     public MyIterator() { }
> >> >> >
> >> >> >     @Override
> >> >> >     public void init(SortedKeyValueIterator<Key, Value> source,
> >> >> >                      Map<String, String> options,
> >> >> >                      IteratorEnvironment env) throws IOException {
> >> >> >         System.out.println("MyIterator: init");
> >> >> >         this.source = source;
> >> >> >     }
> >> >> >
> >> >> >     @Override
> >> >> >     public boolean hasTop() {
> >> >> >         System.out.println("MyIterator: hasTop");
> >> >> >         return source.hasTop();
> >> >> >     }
> >> >> >
> >> >> >     @Override
> >> >> >     public void next() throws IOException {
> >> >> >         System.out.println("MyIterator: next");
> >> >> >         source.next();
> >> >> >     }
> >> >> >
> >> >> >     @Override
> >> >> >     public void seek(Range range, Collection<ByteSequence>
> >> >> > columnFamilies,
> >> >> > boolean inclusive) throws IOException {
> >> >> >         System.out.println("MyIterator: seek");
> >> >> >         source.seek(range, columnFamilies, inclusive);
> >> >> >     }
> >> >> >
> >> >> >     @Override
> >> >> >     public Key getTopKey() {
> >> >> >         System.out.println("MyIterator: getTopKey");
> >> >> >         return source.getTopKey();
> >> >> >     }
> >> >> >
> >> >> >     @Override
> >> >> >     public Value getTopValue() {
> >> >> >         System.out.println("MyIterator: getTopValue");
> >> >> >         return source.getTopValue();
> >> >> >     }
> >> >> >
> >> >> >     @Override
> >> >> >     public SortedKeyValueIterator<Key, Value>
> >> >> > deepCopy(IteratorEnvironment
> >> >> > env) {
> >> >> >         System.out.println("MyIterator: deepCopy");
> >> >> >         return source.deepCopy(env);
> >> >> >     }
> >> >> > }
> >> >> >
> >> >> > @Test
> >> >> > public void testTearDown() throws Exception {
> >> >> >     String table = "test";
> >> >> >     Connector conn = cluster.getConnector("root", "secret");
> >> >> >     conn.tableOperations().create(table);
> >> >> >     conn.tableOperations().attachIterator(table, new
> >> >> > IteratorSetting(25,
> >> >> > MyIterator.class));
> >> >> >     conn.tableOperations().setProperty(table,
> >> >> > "table.scan.max.memory",
> >> >> > "1");
> >> >> >
> >> >> >     BatchWriter writer = conn.createBatchWriter(table, new
> >> >> > BatchWriterConfig());
> >> >> >
> >> >> >     Mutation m1 = new Mutation("row");
> >> >> >     m1.put("f1", "q1", 1, "val1");
> >> >> >     writer.addMutation(m1);
> >> >> >
> >> >> >     Mutation m2 = new Mutation("row");
> >> >> >     m2.put("f2", "q2", 1, "val2");
> >> >> >     writer.addMutation(m2);
> >> >> >
> >> >> >     Mutation m3 = new Mutation("row");
> >> >> >     m3.put("f3", "q3", 1, "val3");
> >> >> >     writer.addMutation(m3);
> >> >> >
> >> >> >     writer.flush();
> >> >> >     writer.close();
> >> >> >
> >> >> >     BatchScanner scanner = conn.createBatchScanner(table, new
> >> >> > Authorizations(), 3);
> >> >> >     scanner.setRanges(Collections.singletonList(new Range()));
> >> >> >     for(Map.Entry<Key, Value> entry : scanner) {
> >> >> >         System.out.println(entry.getKey() + " : " +
> >> >> > entry.getValue());
> >> >> >     }
> >> >> >     System.out.println("Results complete!");
> >> >> > }
>

Re: Teardown and deepCopy

Posted by Keith Turner <ke...@deenlo.com>.
On Wed, Jan 4, 2017 at 11:42 AM, Roshan Punnoose <ro...@gmail.com> wrote:
> I have a tablet with an unsorted list of IDs in the Column Qualifier, these
> IDs can repeat sporadically. So I was hoping to keep a set of these IDs
> around in memory to check if I have seen an ID or not. There is some other

When you see an ID again, what action do you want to take?

> logic to ensure that the set does not grow unbounded, but just trying to
> figure out if I can keep this ID set around. With the teardown, even though
> I know which was the last Key to return from the new seek Range, I don't
> know if I have seen the upcoming IDs. Not sure if that makes sense...
>
> Was thinking that on teardown, we could use either the deepCopy or init
> method to rollover state from the torn down iterator to the new iterator.
>
> On Wed, Jan 4, 2017 at 11:14 AM Keith Turner <ke...@deenlo.com> wrote:
>>
>> On Wed, Jan 4, 2017 at 10:44 AM, Roshan Punnoose <ro...@gmail.com>
>> wrote:
>> > Keith,
>> >
>> > If an iterator has state that it is maintaining, what is the best way to
>> > transfer that state to the new iterator after a tear down?  For example,
>> > MyIterator might have a Boolean flag of some sort. After tear down, is
>> > there
>> > a way to copy that state to the new iterator before it starts seeking
>> > again?
>>
>> There is nothing currently built in to help with this.
>>
>> What are you trying to accomplish?  Are you interested in maintaining
>> this state for a scan or batch scan?
>>
>>
>> >
>> > Roshan
>> >
>> > On Wed, Jan 4, 2017 at 10:33 AM Keith Turner <ke...@deenlo.com> wrote:
>> >>
>> >> Josh,
>> >>
>> >> Deepcopy is not called when an iterator is torn down.  It has an
>> >> entirely different use. Deepcopy allows cloning of an iterator during
>> >> init().  The clones allow you to have multiple pointers into a tablets
>> >> data which allows things like server side joins.
>> >>
>> >> Keith
>> >>
>> >> On Wed, Dec 28, 2016 at 12:50 PM, Josh Clum <jo...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I have a question about iterator teardown. It seems from
>> >> >
>> >> >
>> >> > https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/iterator_design.txt#L383-L390
>> >> > that deepCopy should be called when an iterator is torn down. I'm not
>> >> > seeing
>> >> > that behavior. Below is a test that sets table.scan.max.memory to 1
>> >> > which
>> >> > should force a tear down for each kv returned. I should see deepCopy
>> >> > being
>> >> > called 3 times but when I tail the Tserver logs I'm not seeing it
>> >> > being
>> >> > called. Below is the test and the Tserver output.
>> >> >
>> >> > What am I missing here?
>> >> >
>> >> > Josh
>> >> >
>> >> > ➜  tail -f -n200 ...../accumulo/logs/TabletServer_*.out | grep
>> >> > MyIterator
>> >> > MyIterator: init
>> >> > MyIterator: seek
>> >> > MyIterator: hasTop
>> >> > MyIterator: getTopKey
>> >> > MyIterator: getTopValue
>> >> > MyIterator: init
>> >> > MyIterator: seek
>> >> > MyIterator: hasTop
>> >> > MyIterator: getTopKey
>> >> > MyIterator: getTopValue
>> >> > MyIterator: init
>> >> > MyIterator: seek
>> >> > MyIterator: hasTop
>> >> > MyIterator: getTopKey
>> >> > MyIterator: getTopValue
>> >> > MyIterator: init
>> >> > MyIterator: seek
>> >> > MyIterator: hasTop
>> >> >
>> >> > public static class MyIterator implements SortedKeyValueIterator<Key,
>> >> > Value>
>> >> > {
>> >> >
>> >> >     private SortedKeyValueIterator<Key, Value> source;
>> >> >
>> >> >     public MyIterator() { }
>> >> >
>> >> >     @Override
>> >> >     public void init(SortedKeyValueIterator<Key, Value> source,
>> >> >                      Map<String, String> options,
>> >> >                      IteratorEnvironment env) throws IOException {
>> >> >         System.out.println("MyIterator: init");
>> >> >         this.source = source;
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public boolean hasTop() {
>> >> >         System.out.println("MyIterator: hasTop");
>> >> >         return source.hasTop();
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public void next() throws IOException {
>> >> >         System.out.println("MyIterator: next");
>> >> >         source.next();
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public void seek(Range range, Collection<ByteSequence>
>> >> > columnFamilies,
>> >> > boolean inclusive) throws IOException {
>> >> >         System.out.println("MyIterator: seek");
>> >> >         source.seek(range, columnFamilies, inclusive);
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public Key getTopKey() {
>> >> >         System.out.println("MyIterator: getTopKey");
>> >> >         return source.getTopKey();
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public Value getTopValue() {
>> >> >         System.out.println("MyIterator: getTopValue");
>> >> >         return source.getTopValue();
>> >> >     }
>> >> >
>> >> >     @Override
>> >> >     public SortedKeyValueIterator<Key, Value>
>> >> > deepCopy(IteratorEnvironment
>> >> > env) {
>> >> >         System.out.println("MyIterator: deepCopy");
>> >> >         return source.deepCopy(env);
>> >> >     }
>> >> > }
>> >> >
>> >> > @Test
>> >> > public void testTearDown() throws Exception {
>> >> >     String table = "test";
>> >> >     Connector conn = cluster.getConnector("root", "secret");
>> >> >     conn.tableOperations().create(table);
>> >> >     conn.tableOperations().attachIterator(table, new
>> >> > IteratorSetting(25,
>> >> > MyIterator.class));
>> >> >     conn.tableOperations().setProperty(table,
>> >> > "table.scan.max.memory",
>> >> > "1");
>> >> >
>> >> >     BatchWriter writer = conn.createBatchWriter(table, new
>> >> > BatchWriterConfig());
>> >> >
>> >> >     Mutation m1 = new Mutation("row");
>> >> >     m1.put("f1", "q1", 1, "val1");
>> >> >     writer.addMutation(m1);
>> >> >
>> >> >     Mutation m2 = new Mutation("row");
>> >> >     m2.put("f2", "q2", 1, "val2");
>> >> >     writer.addMutation(m2);
>> >> >
>> >> >     Mutation m3 = new Mutation("row");
>> >> >     m3.put("f3", "q3", 1, "val3");
>> >> >     writer.addMutation(m3);
>> >> >
>> >> >     writer.flush();
>> >> >     writer.close();
>> >> >
>> >> >     BatchScanner scanner = conn.createBatchScanner(table, new
>> >> > Authorizations(), 3);
>> >> >     scanner.setRanges(Collections.singletonList(new Range()));
>> >> >     for(Map.Entry<Key, Value> entry : scanner) {
>> >> >         System.out.println(entry.getKey() + " : " +
>> >> > entry.getValue());
>> >> >     }
>> >> >     System.out.println("Results complete!");
>> >> > }

Re: Teardown and deepCopy

Posted by Roshan Punnoose <ro...@gmail.com>.
I have a tablet with an unsorted list of IDs in the Column Qualifier, these
IDs can repeat sporadically. So I was hoping to keep a set of these IDs
around in memory to check if I have seen an ID or not. There is some other
logic to ensure that the set does not grow unbounded, but just trying to
figure out if I can keep this ID set around. With the teardown, even though
I know which was the last Key to return from the new seek Range, I don't
know if I have seen the upcoming IDs. Not sure if that makes sense...

Was thinking that on teardown, we could use either the deepCopy or init
method to rollover state from the torn down iterator to the new iterator.

On Wed, Jan 4, 2017 at 11:14 AM Keith Turner <ke...@deenlo.com> wrote:

> On Wed, Jan 4, 2017 at 10:44 AM, Roshan Punnoose <ro...@gmail.com>
> wrote:
> > Keith,
> >
> > If an iterator has state that it is maintaining, what is the best way to
> > transfer that state to the new iterator after a tear down?  For example,
> > MyIterator might have a Boolean flag of some sort. After tear down, is
> there
> > a way to copy that state to the new iterator before it starts seeking
> again?
>
> There is nothing currently built in to help with this.
>
> What are you trying to accomplish?  Are you interested in maintaining
> this state for a scan or batch scan?
>
>
> >
> > Roshan
> >
> > On Wed, Jan 4, 2017 at 10:33 AM Keith Turner <ke...@deenlo.com> wrote:
> >>
> >> Josh,
> >>
> >> Deepcopy is not called when an iterator is torn down.  It has an
> >> entirely different use. Deepcopy allows cloning of an iterator during
> >> init().  The clones allow you to have multiple pointers into a tablets
> >> data which allows things like server side joins.
> >>
> >> Keith
> >>
> >> On Wed, Dec 28, 2016 at 12:50 PM, Josh Clum <jo...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I have a question about iterator teardown. It seems from
> >> >
> >> >
> https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/iterator_design.txt#L383-L390
> >> > that deepCopy should be called when an iterator is torn down. I'm not
> >> > seeing
> >> > that behavior. Below is a test that sets table.scan.max.memory to 1
> >> > which
> >> > should force a tear down for each kv returned. I should see deepCopy
> >> > being
> >> > called 3 times but when I tail the Tserver logs I'm not seeing it
> being
> >> > called. Below is the test and the Tserver output.
> >> >
> >> > What am I missing here?
> >> >
> >> > Josh
> >> >
> >> > ➜  tail -f -n200 ...../accumulo/logs/TabletServer_*.out | grep
> >> > MyIterator
> >> > MyIterator: init
> >> > MyIterator: seek
> >> > MyIterator: hasTop
> >> > MyIterator: getTopKey
> >> > MyIterator: getTopValue
> >> > MyIterator: init
> >> > MyIterator: seek
> >> > MyIterator: hasTop
> >> > MyIterator: getTopKey
> >> > MyIterator: getTopValue
> >> > MyIterator: init
> >> > MyIterator: seek
> >> > MyIterator: hasTop
> >> > MyIterator: getTopKey
> >> > MyIterator: getTopValue
> >> > MyIterator: init
> >> > MyIterator: seek
> >> > MyIterator: hasTop
> >> >
> >> > public static class MyIterator implements SortedKeyValueIterator<Key,
> >> > Value>
> >> > {
> >> >
> >> >     private SortedKeyValueIterator<Key, Value> source;
> >> >
> >> >     public MyIterator() { }
> >> >
> >> >     @Override
> >> >     public void init(SortedKeyValueIterator<Key, Value> source,
> >> >                      Map<String, String> options,
> >> >                      IteratorEnvironment env) throws IOException {
> >> >         System.out.println("MyIterator: init");
> >> >         this.source = source;
> >> >     }
> >> >
> >> >     @Override
> >> >     public boolean hasTop() {
> >> >         System.out.println("MyIterator: hasTop");
> >> >         return source.hasTop();
> >> >     }
> >> >
> >> >     @Override
> >> >     public void next() throws IOException {
> >> >         System.out.println("MyIterator: next");
> >> >         source.next();
> >> >     }
> >> >
> >> >     @Override
> >> >     public void seek(Range range, Collection<ByteSequence>
> >> > columnFamilies,
> >> > boolean inclusive) throws IOException {
> >> >         System.out.println("MyIterator: seek");
> >> >         source.seek(range, columnFamilies, inclusive);
> >> >     }
> >> >
> >> >     @Override
> >> >     public Key getTopKey() {
> >> >         System.out.println("MyIterator: getTopKey");
> >> >         return source.getTopKey();
> >> >     }
> >> >
> >> >     @Override
> >> >     public Value getTopValue() {
> >> >         System.out.println("MyIterator: getTopValue");
> >> >         return source.getTopValue();
> >> >     }
> >> >
> >> >     @Override
> >> >     public SortedKeyValueIterator<Key, Value>
> >> > deepCopy(IteratorEnvironment
> >> > env) {
> >> >         System.out.println("MyIterator: deepCopy");
> >> >         return source.deepCopy(env);
> >> >     }
> >> > }
> >> >
> >> > @Test
> >> > public void testTearDown() throws Exception {
> >> >     String table = "test";
> >> >     Connector conn = cluster.getConnector("root", "secret");
> >> >     conn.tableOperations().create(table);
> >> >     conn.tableOperations().attachIterator(table, new
> IteratorSetting(25,
> >> > MyIterator.class));
> >> >     conn.tableOperations().setProperty(table, "table.scan.max.memory",
> >> > "1");
> >> >
> >> >     BatchWriter writer = conn.createBatchWriter(table, new
> >> > BatchWriterConfig());
> >> >
> >> >     Mutation m1 = new Mutation("row");
> >> >     m1.put("f1", "q1", 1, "val1");
> >> >     writer.addMutation(m1);
> >> >
> >> >     Mutation m2 = new Mutation("row");
> >> >     m2.put("f2", "q2", 1, "val2");
> >> >     writer.addMutation(m2);
> >> >
> >> >     Mutation m3 = new Mutation("row");
> >> >     m3.put("f3", "q3", 1, "val3");
> >> >     writer.addMutation(m3);
> >> >
> >> >     writer.flush();
> >> >     writer.close();
> >> >
> >> >     BatchScanner scanner = conn.createBatchScanner(table, new
> >> > Authorizations(), 3);
> >> >     scanner.setRanges(Collections.singletonList(new Range()));
> >> >     for(Map.Entry<Key, Value> entry : scanner) {
> >> >         System.out.println(entry.getKey() + " : " + entry.getValue());
> >> >     }
> >> >     System.out.println("Results complete!");
> >> > }
>

Re: Teardown and deepCopy

Posted by Keith Turner <ke...@deenlo.com>.
On Wed, Jan 4, 2017 at 10:44 AM, Roshan Punnoose <ro...@gmail.com> wrote:
> Keith,
>
> If an iterator has state that it is maintaining, what is the best way to
> transfer that state to the new iterator after a tear down?  For example,
> MyIterator might have a Boolean flag of some sort. After tear down, is there
> a way to copy that state to the new iterator before it starts seeking again?

There is nothing currently built in to help with this.

What are you trying to accomplish?  Are you interested in maintaining
this state for a scan or batch scan?


>
> Roshan
>
> On Wed, Jan 4, 2017 at 10:33 AM Keith Turner <ke...@deenlo.com> wrote:
>>
>> Josh,
>>
>> Deepcopy is not called when an iterator is torn down.  It has an
>> entirely different use. Deepcopy allows cloning of an iterator during
>> init().  The clones allow you to have multiple pointers into a tablets
>> data which allows things like server side joins.
>>
>> Keith
>>
>> On Wed, Dec 28, 2016 at 12:50 PM, Josh Clum <jo...@gmail.com> wrote:
>> > Hi,
>> >
>> > I have a question about iterator teardown. It seems from
>> >
>> > https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/iterator_design.txt#L383-L390
>> > that deepCopy should be called when an iterator is torn down. I'm not
>> > seeing
>> > that behavior. Below is a test that sets table.scan.max.memory to 1
>> > which
>> > should force a tear down for each kv returned. I should see deepCopy
>> > being
>> > called 3 times but when I tail the Tserver logs I'm not seeing it being
>> > called. Below is the test and the Tserver output.
>> >
>> > What am I missing here?
>> >
>> > Josh
>> >
>> > ➜  tail -f -n200 ...../accumulo/logs/TabletServer_*.out | grep
>> > MyIterator
>> > MyIterator: init
>> > MyIterator: seek
>> > MyIterator: hasTop
>> > MyIterator: getTopKey
>> > MyIterator: getTopValue
>> > MyIterator: init
>> > MyIterator: seek
>> > MyIterator: hasTop
>> > MyIterator: getTopKey
>> > MyIterator: getTopValue
>> > MyIterator: init
>> > MyIterator: seek
>> > MyIterator: hasTop
>> > MyIterator: getTopKey
>> > MyIterator: getTopValue
>> > MyIterator: init
>> > MyIterator: seek
>> > MyIterator: hasTop
>> >
>> > public static class MyIterator implements SortedKeyValueIterator<Key,
>> > Value>
>> > {
>> >
>> >     private SortedKeyValueIterator<Key, Value> source;
>> >
>> >     public MyIterator() { }
>> >
>> >     @Override
>> >     public void init(SortedKeyValueIterator<Key, Value> source,
>> >                      Map<String, String> options,
>> >                      IteratorEnvironment env) throws IOException {
>> >         System.out.println("MyIterator: init");
>> >         this.source = source;
>> >     }
>> >
>> >     @Override
>> >     public boolean hasTop() {
>> >         System.out.println("MyIterator: hasTop");
>> >         return source.hasTop();
>> >     }
>> >
>> >     @Override
>> >     public void next() throws IOException {
>> >         System.out.println("MyIterator: next");
>> >         source.next();
>> >     }
>> >
>> >     @Override
>> >     public void seek(Range range, Collection<ByteSequence>
>> > columnFamilies,
>> > boolean inclusive) throws IOException {
>> >         System.out.println("MyIterator: seek");
>> >         source.seek(range, columnFamilies, inclusive);
>> >     }
>> >
>> >     @Override
>> >     public Key getTopKey() {
>> >         System.out.println("MyIterator: getTopKey");
>> >         return source.getTopKey();
>> >     }
>> >
>> >     @Override
>> >     public Value getTopValue() {
>> >         System.out.println("MyIterator: getTopValue");
>> >         return source.getTopValue();
>> >     }
>> >
>> >     @Override
>> >     public SortedKeyValueIterator<Key, Value>
>> > deepCopy(IteratorEnvironment
>> > env) {
>> >         System.out.println("MyIterator: deepCopy");
>> >         return source.deepCopy(env);
>> >     }
>> > }
>> >
>> > @Test
>> > public void testTearDown() throws Exception {
>> >     String table = "test";
>> >     Connector conn = cluster.getConnector("root", "secret");
>> >     conn.tableOperations().create(table);
>> >     conn.tableOperations().attachIterator(table, new IteratorSetting(25,
>> > MyIterator.class));
>> >     conn.tableOperations().setProperty(table, "table.scan.max.memory",
>> > "1");
>> >
>> >     BatchWriter writer = conn.createBatchWriter(table, new
>> > BatchWriterConfig());
>> >
>> >     Mutation m1 = new Mutation("row");
>> >     m1.put("f1", "q1", 1, "val1");
>> >     writer.addMutation(m1);
>> >
>> >     Mutation m2 = new Mutation("row");
>> >     m2.put("f2", "q2", 1, "val2");
>> >     writer.addMutation(m2);
>> >
>> >     Mutation m3 = new Mutation("row");
>> >     m3.put("f3", "q3", 1, "val3");
>> >     writer.addMutation(m3);
>> >
>> >     writer.flush();
>> >     writer.close();
>> >
>> >     BatchScanner scanner = conn.createBatchScanner(table, new
>> > Authorizations(), 3);
>> >     scanner.setRanges(Collections.singletonList(new Range()));
>> >     for(Map.Entry<Key, Value> entry : scanner) {
>> >         System.out.println(entry.getKey() + " : " + entry.getValue());
>> >     }
>> >     System.out.println("Results complete!");
>> > }

Re: Teardown and deepCopy

Posted by Roshan Punnoose <ro...@gmail.com>.
Keith,

If an iterator has state that it is maintaining, what is the best way to
transfer that state to the new iterator after a tear down?  For example,
MyIterator might have a Boolean flag of some sort. After tear down, is
there a way to copy that state to the new iterator before it starts seeking
again?

Roshan

On Wed, Jan 4, 2017 at 10:33 AM Keith Turner <ke...@deenlo.com> wrote:

> Josh,
>
> Deepcopy is not called when an iterator is torn down.  It has an
> entirely different use. Deepcopy allows cloning of an iterator during
> init().  The clones allow you to have multiple pointers into a tablets
> data which allows things like server side joins.
>
> Keith
>
> On Wed, Dec 28, 2016 at 12:50 PM, Josh Clum <jo...@gmail.com> wrote:
> > Hi,
> >
> > I have a question about iterator teardown. It seems from
> >
> https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/iterator_design.txt#L383-L390
> > that deepCopy should be called when an iterator is torn down. I'm not
> seeing
> > that behavior. Below is a test that sets table.scan.max.memory to 1 which
> > should force a tear down for each kv returned. I should see deepCopy
> being
> > called 3 times but when I tail the Tserver logs I'm not seeing it being
> > called. Below is the test and the Tserver output.
> >
> > What am I missing here?
> >
> > Josh
> >
> > ➜  tail -f -n200 ...../accumulo/logs/TabletServer_*.out | grep MyIterator
> > MyIterator: init
> > MyIterator: seek
> > MyIterator: hasTop
> > MyIterator: getTopKey
> > MyIterator: getTopValue
> > MyIterator: init
> > MyIterator: seek
> > MyIterator: hasTop
> > MyIterator: getTopKey
> > MyIterator: getTopValue
> > MyIterator: init
> > MyIterator: seek
> > MyIterator: hasTop
> > MyIterator: getTopKey
> > MyIterator: getTopValue
> > MyIterator: init
> > MyIterator: seek
> > MyIterator: hasTop
> >
> > public static class MyIterator implements SortedKeyValueIterator<Key,
> Value>
> > {
> >
> >     private SortedKeyValueIterator<Key, Value> source;
> >
> >     public MyIterator() { }
> >
> >     @Override
> >     public void init(SortedKeyValueIterator<Key, Value> source,
> >                      Map<String, String> options,
> >                      IteratorEnvironment env) throws IOException {
> >         System.out.println("MyIterator: init");
> >         this.source = source;
> >     }
> >
> >     @Override
> >     public boolean hasTop() {
> >         System.out.println("MyIterator: hasTop");
> >         return source.hasTop();
> >     }
> >
> >     @Override
> >     public void next() throws IOException {
> >         System.out.println("MyIterator: next");
> >         source.next();
> >     }
> >
> >     @Override
> >     public void seek(Range range, Collection<ByteSequence>
> columnFamilies,
> > boolean inclusive) throws IOException {
> >         System.out.println("MyIterator: seek");
> >         source.seek(range, columnFamilies, inclusive);
> >     }
> >
> >     @Override
> >     public Key getTopKey() {
> >         System.out.println("MyIterator: getTopKey");
> >         return source.getTopKey();
> >     }
> >
> >     @Override
> >     public Value getTopValue() {
> >         System.out.println("MyIterator: getTopValue");
> >         return source.getTopValue();
> >     }
> >
> >     @Override
> >     public SortedKeyValueIterator<Key, Value>
> deepCopy(IteratorEnvironment
> > env) {
> >         System.out.println("MyIterator: deepCopy");
> >         return source.deepCopy(env);
> >     }
> > }
> >
> > @Test
> > public void testTearDown() throws Exception {
> >     String table = "test";
> >     Connector conn = cluster.getConnector("root", "secret");
> >     conn.tableOperations().create(table);
> >     conn.tableOperations().attachIterator(table, new IteratorSetting(25,
> > MyIterator.class));
> >     conn.tableOperations().setProperty(table, "table.scan.max.memory",
> "1");
> >
> >     BatchWriter writer = conn.createBatchWriter(table, new
> > BatchWriterConfig());
> >
> >     Mutation m1 = new Mutation("row");
> >     m1.put("f1", "q1", 1, "val1");
> >     writer.addMutation(m1);
> >
> >     Mutation m2 = new Mutation("row");
> >     m2.put("f2", "q2", 1, "val2");
> >     writer.addMutation(m2);
> >
> >     Mutation m3 = new Mutation("row");
> >     m3.put("f3", "q3", 1, "val3");
> >     writer.addMutation(m3);
> >
> >     writer.flush();
> >     writer.close();
> >
> >     BatchScanner scanner = conn.createBatchScanner(table, new
> > Authorizations(), 3);
> >     scanner.setRanges(Collections.singletonList(new Range()));
> >     for(Map.Entry<Key, Value> entry : scanner) {
> >         System.out.println(entry.getKey() + " : " + entry.getValue());
> >     }
> >     System.out.println("Results complete!");
> > }
>