You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by ivakegg <gi...@git.apache.org> on 2017/05/25 18:36:25 UTC

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

GitHub user ivakegg opened a pull request:

    https://github.com/apache/accumulo/pull/260

    ACCUMULO-4643 initial implementation

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ivakegg/accumulo ACCUMULO-4643

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #260
    
----
commit 8100aacd8f79d6b670e65505a272b303b552750f
Author: Ivan Bella <iv...@bella.name>
Date:   2017-05-25T18:22:24Z

    ACCUMULO-4643 initial implementation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r118569076
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/YieldScannersIT.java ---
    @@ -0,0 +1,71 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test;
    +
    +import com.google.common.collect.Iterators;
    +import org.apache.accumulo.core.client.Connector;
    +import org.apache.accumulo.core.client.IteratorSetting;
    +import org.apache.accumulo.core.client.Scanner;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.security.Authorizations;
    +import org.apache.accumulo.harness.AccumuloClusterHarness;
    +import org.apache.accumulo.minicluster.impl.MiniAccumuloConfigImpl;
    +import org.apache.accumulo.test.functional.YieldingIterator;
    +import org.apache.hadoop.conf.Configuration;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import java.util.Iterator;
    +import java.util.Map;
    +
    +// ACCUMULO-4643
    +public class YieldScannersIT extends AccumuloClusterHarness {
    +
    +  @Override
    +  public int defaultTimeoutSeconds() {
    +    return 600000;
    +  }
    +
    +  @Override
    +  public void configureMiniCluster(MiniAccumuloConfigImpl cfg, Configuration hadoopCoreSite) {
    +    cfg.setNumTservers(1);
    +  }
    +
    +  @Test
    +  public void test() throws Exception {
    +    // make a table
    +    final String tableName = getUniqueNames(1)[0];
    +    final Connector conn = getConnector();
    +    conn.tableOperations().create(tableName);
    +
    +    // make a scanner for a table with 10 keys
    +    final Scanner scanner = conn.createScanner(tableName, Authorizations.EMPTY);
    +    final IteratorSetting cfg = new IteratorSetting(100, YieldingIterator.class);
    +    cfg.addOption(YieldingIterator.NUMBER_OF_KEYS, "10");
    --- End diff --
    
    Sorry, saw your second note about tweaking the test after leaving this :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @phrocker I apologize, but either I am missing something or I have not adequately explained what is going on here.  First and foremost, no iterator will ever yield unless somebody explicitly writes an iterator that does.  Yields are never forced.  This is completely cooperative yielding.  If somebody writes an iterator that does yield, then the iterator should only yield if it is taking too much time between keys being returned.  In other words, if the iterator decides to yield often then it will suffer in overall performance.  There will only be an increase in RPC calls if the iterator yields when there are keys in the buffer, but we have not reached scan max memory.  Yes, long scans will continue to be long.  This mechanism is simply to allow the system being developed to avoid starvation of short running scans if they cannot afford to increase the number of threads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119133690
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -586,6 +589,10 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
               mmfi.next();
             }
     
    +      } catch (ScanYieldException sye) {
    +        log.debug("Scan yield exception detected at position " + sye.getPosition());
    +        addUnfinishedRange(lookupResult, range, sye.getPosition(), false);
    --- End diff --
    
    Each key that is actually returned by the iterator will be returned to the user.  The ScanYieldException would be thrown within the next call (presumably because it has been doing a lot of scanning and not actually finding any keys to return).  The key placed in the exception would where the next call left off before finding the next key to return.  All keys already returned by getTopKey() after successfull next calls would still be returned appropriately.
    If you follow the timesUp or the exceededMemoryUsage used here, you will see that it is essentially follows the same paradigm of yielding the scan until later.  I am just supplying a way for the underlying iterator to force this to happen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120022654
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    +  private K key;
    +
    +  /**
    +   * Called by the iterator when a next or seek call yields control.
    +   *
    +   * @param key
    +   *          the key position at which the iterator yielded.
    +   */
    +  public void yield(K key) {
    +    this.key = key;
    +  }
    +
    +  /**
    +   * Called by the client to see if the iterator yielded
    +   *
    +   * @return true if iterator yielded control
    +   */
    +  public boolean hasYielded() {
    +    return (this.key != null);
    --- End diff --
    
    Dude, trying to be funny at the same time denoting that I disagree with this nit.  Perhaps my lisp background as something to do with it but I guess that is my cross to bear.  I certainly appreciate your efforts and time.  I will buy you a beer if it would help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    still working the test cases, but this code should show you the idea....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by phrocker <gi...@git.apache.org>.
Github user phrocker commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    Have you addressed Josh's concern about RPC calls? Further if you profile the context switching involved with the threads migrating is the net benefit better than stopping the scan entirely? It stands to reason that a long running scan will continue to be long running.  
    
    Finally, since this is mostly uncooperative yielding, is there any way to measure the fringe conditions, as per metrics that Josh alluded to? Such as average time before forced yield, etc? I think if you're baking this into SKVI that should be an expectation of any initial implementation to address the concern of provability. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120448323
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -565,7 +572,7 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
             else
               mmfi.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
     
    -        while (mmfi.hasTop()) {
    +        while (!yield.hasYielded() && mmfi.hasTop()) {
    --- End diff --
    
    I think this is a reasonable requirement on the iterator implementation. I will code to that and add a comment to the hasTop() method in the SKVI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120022498
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +774,29 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      skipContinueKey = true;
    +      if (!range.contains(continueKey)) {
    +        throw new IOException("Underlying iterator yielded to a position outside of its range: " + continueKey + " not in " + range);
    +      }
     
    -    if (endOfTabletReached) {
    -      continueKey = null;
    -    }
    +      log.debug("Scan yield detected at position " + continueKey);
    --- End diff --
    
    > I, as an operator, want to know whether this has been happening on a tserver or not.
    
    Repeating my previous comment: "Metrics != logging". I believe you've proved my point in your response. If you want to know how often yielding is happening, you should look at the metrics you implemented and not place extra cruft in the log to just be removed in every other situation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by phrocker <gi...@git.apache.org>.
Github user phrocker commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119114135
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/functional/YieldingIterator.java ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test.functional;
    +
    +import java.util.concurrent.atomic.AtomicBoolean;
    +import org.apache.accumulo.core.data.ByteSequence;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Range;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.iterators.IteratorEnvironment;
    +import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
    +import org.apache.accumulo.core.iterators.WrappingIterator;
    +import org.apache.accumulo.core.iterators.user.ScanYieldException;
    +
    +import java.io.IOException;
    +import java.util.Collection;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/**
    + * This iterator will throw a ScanYieldException every other key
    + */
    +public class YieldingIterator extends WrappingIterator {
    +  private static final Logger log = LoggerFactory.getLogger(YieldingIterator.class);
    +  private static final AtomicInteger yieldNexts = new AtomicInteger(0);
    +  private static final AtomicInteger yieldSeeks = new AtomicInteger(0);
    +  private static final AtomicInteger rebuilds = new AtomicInteger(0);
    +
    +  private static final AtomicBoolean yieldNextKey = new AtomicBoolean(false);
    +  private static final AtomicBoolean yieldSeekKey = new AtomicBoolean(false);
    +
    +  @Override
    +  public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
    +    YieldingIterator it = new YieldingIterator();
    +    it.setSource(it.getSource().deepCopy(env));
    +    return it;
    +  }
    +
    +  @Override
    +  public void next() throws IOException {
    +    log.info("start YieldingIterator.next: " + getTopValue());
    +    yieldNextKey.set(!yieldNextKey.get());
    +    if (yieldNextKey.get()) {
    +      log.info("YieldingIterator.next yielding: " + getTopValue());
    +      yieldNexts.incrementAndGet();
    +      throw new ScanYieldException(getTopKey());
    +    }
    +    super.next();
    +    log.info("end YieldingIterator.next: " + (hasTop() ? getTopKey() + " " + getTopValue() : "no top"));
    +  }
    +
    +  @Override
    +  public Value getTopValue() {
    +    String value = Integer.toString(yieldNexts.get()) + ',' + Integer.toString(yieldSeeks.get()) + ',' + Integer.toString(rebuilds.get());
    +    return new Value(value);
    +  }
    +
    +  @Override
    +  public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
    +    log.info("start YieldingIterator.seek: " + getTopValue() + " with range " + range);
    --- End diff --
    
    shouldn't this be log.info("start YieldingIterator.seek: {} with range {}",getTopValue(),range); ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by dlmarion <gi...@git.apache.org>.
Github user dlmarion commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119597285
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +764,22 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      log.debug("Scan yield exception detected at position " + continueKey);
    --- End diff --
    
    Error message is probably from earlier version of code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r118568742
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/YieldScannersIT.java ---
    @@ -0,0 +1,71 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test;
    +
    +import com.google.common.collect.Iterators;
    +import org.apache.accumulo.core.client.Connector;
    +import org.apache.accumulo.core.client.IteratorSetting;
    +import org.apache.accumulo.core.client.Scanner;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.security.Authorizations;
    +import org.apache.accumulo.harness.AccumuloClusterHarness;
    +import org.apache.accumulo.minicluster.impl.MiniAccumuloConfigImpl;
    +import org.apache.accumulo.test.functional.YieldingIterator;
    +import org.apache.hadoop.conf.Configuration;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import java.util.Iterator;
    +import java.util.Map;
    +
    +// ACCUMULO-4643
    +public class YieldScannersIT extends AccumuloClusterHarness {
    +
    +  @Override
    +  public int defaultTimeoutSeconds() {
    +    return 600000;
    --- End diff --
    
    Seems really long to read 10 entries :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119135392
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/YieldScannersIT.java ---
    @@ -0,0 +1,106 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test;
    +
    +import org.apache.accumulo.core.client.BatchWriter;
    +import org.apache.accumulo.core.client.BatchWriterConfig;
    +import org.apache.accumulo.core.client.Connector;
    +import org.apache.accumulo.core.client.IteratorSetting;
    +import org.apache.accumulo.core.client.Scanner;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Mutation;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.security.Authorizations;
    +import org.apache.accumulo.harness.AccumuloClusterHarness;
    +import org.apache.accumulo.minicluster.impl.MiniAccumuloConfigImpl;
    +import org.apache.accumulo.test.functional.YieldingIterator;
    +import org.apache.commons.lang.StringUtils;
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.io.Text;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import java.util.Iterator;
    +import java.util.Map;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +// ACCUMULO-4643
    +public class YieldScannersIT extends AccumuloClusterHarness {
    +  Logger log = LoggerFactory.getLogger(YieldScannersIT.class);
    +
    +  @Override
    +  public int defaultTimeoutSeconds() {
    +    return 60;
    +  }
    +
    +  @Override
    +  public void configureMiniCluster(MiniAccumuloConfigImpl cfg, Configuration hadoopCoreSite) {
    +    cfg.setNumTservers(1);
    +  }
    +
    +  @Test
    +  public void test() throws Exception {
    +    // make a table
    +    final String tableName = getUniqueNames(1)[0];
    +    final Connector conn = getConnector();
    +    conn.tableOperations().create(tableName);
    +    final BatchWriter writer = conn.createBatchWriter(tableName, new BatchWriterConfig());
    +    for (int i = 0; i < 10; i++) {
    +      byte[] row = new byte[] {(byte) ('a' + i)};
    +      Mutation m = new Mutation(new Text(row));
    +      m.put(new Text(), new Text(), new Value());
    +      writer.addMutation(m);
    +    }
    +    writer.flush();
    +    writer.close();
    +
    +    log.info("Creating scanner");
    +    // make a scanner for a table with 10 keys
    +    final Scanner scanner = conn.createScanner(tableName, Authorizations.EMPTY);
    +    final IteratorSetting cfg = new IteratorSetting(100, YieldingIterator.class);
    +    scanner.addScanIterator(cfg);
    +
    +    log.info("iterating");
    +    Iterator<Map.Entry<Key,Value>> it = scanner.iterator();
    +    int keyCount = 0;
    +    int yieldNextCount = 0;
    +    int yieldSeekCount = 0;
    +    while (it.hasNext()) {
    +      Map.Entry<Key,Value> next = it.next();
    +      log.info(Integer.toString(keyCount) + ": Got key " + next.getKey() + " with value " + next.getValue());
    +
    +      // verify we got the expected key
    +      char expected = (char) ('a' + keyCount);
    --- End diff --
    
    Using a constant (instead of inline) that defines why it's being used makes much more sense to me and can avoid the implicit casts between `char` and `int`. Being a bit more terse would help with clarity, but whatever if you disagree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119164956
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/functional/YieldingIterator.java ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test.functional;
    +
    +import java.util.concurrent.atomic.AtomicBoolean;
    +import org.apache.accumulo.core.data.ByteSequence;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Range;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.iterators.IteratorEnvironment;
    +import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
    +import org.apache.accumulo.core.iterators.WrappingIterator;
    +import org.apache.accumulo.core.iterators.user.ScanYieldException;
    +
    +import java.io.IOException;
    +import java.util.Collection;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/**
    + * This iterator will throw a ScanYieldException every other key
    + */
    --- End diff --
    
    Shall do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119991567
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    +  private K key;
    +
    +  /**
    +   * Called by the iterator when a next or seek call yields control.
    +   *
    +   * @param key
    +   *          the key position at which the iterator yielded.
    +   */
    +  public void yield(K key) {
    +    this.key = key;
    +  }
    +
    +  /**
    +   * Called by the client to see if the iterator yielded
    +   *
    +   * @return true if iterator yielded control
    +   */
    +  public boolean hasYielded() {
    +    return (this.key != null);
    --- End diff --
    
    nit: parens are clearer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by dlmarion <gi...@git.apache.org>.
Github user dlmarion commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119457446
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/SortedKeyValueIterator.java ---
    @@ -147,4 +147,35 @@
        *              if not supported.
        */
       SortedKeyValueIterator<K,V> deepCopy(IteratorEnvironment env);
    +
    +  /**
    +   * Called when the client (i.e. calling iterator or the tserver) will support this iterators ability to yield. If called (and true is returned), then when
    +   * next or seek is called this iterator may yield before finding the next top value and key. When this happens, the yield position (returned by
    +   * getYieldPosition) will be used when this iterator is rebuilt to seek to the position at which it yielded. This iterator can only yield if the user of this
    +   * iterator has called this method. Note that setYieldSupported will never be called in row isolation mode.
    +   *
    +   * @return <tt>boolean</tt> True if this iterator supported yielding, false otherwise.
    +   */
    +  default boolean setYieldSupported() {
    +    return false;
    +  }
    +
    +  /**
    +   * If setYieldSupported has been called and true was returned, then this method will be called after every seek and next call.
    +   *
    +   * @return <tt>boolean</tt> true if the previous next or seek call has yielded without finding the next top key.
    +   */
    +  default boolean hasYielded() {
    +    return false;
    +  }
    +
    +  /**
    +   * If hasYielded returned true, then this will return the key which will be used as the start key (non-inclusive) of the range in the next seek call. No other
    +   * method will be called except for seek after calling this method.
    +   *
    +   * @return <tt>K</tt>
    +   */
    +  default K getYieldPosition() {
    --- End diff --
    
    Should this throw an IllegalStateException if hasYielded() returns false?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119137299
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/SortedKeyValueIterator.java ---
    @@ -69,6 +70,10 @@
        *              if called before seek.
        * @exception NoSuchElementException
        *              if next element doesn't exist.
    +   * @exception ScanYieldException
    +   *              Thrown if the iterator is permitting the scan to be torn down to allow other scans to use this thread. An iterator may throw this if it had
    --- End diff --
    
    good catch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    Changed the API to be enableYielding(YieldCallback) instead of separate methods on the SKVI


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120161878
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    --- End diff --
    
    That is a good start, however you would also need to pas the callback down to the source in the setYieldEnabled call and would need to modify the seek call to un-transform the key when seeking the source.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119134616
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/functional/YieldingIterator.java ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test.functional;
    +
    +import java.util.concurrent.atomic.AtomicBoolean;
    +import org.apache.accumulo.core.data.ByteSequence;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Range;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.iterators.IteratorEnvironment;
    +import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
    +import org.apache.accumulo.core.iterators.WrappingIterator;
    +import org.apache.accumulo.core.iterators.user.ScanYieldException;
    +
    +import java.io.IOException;
    +import java.util.Collection;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/**
    + * This iterator will throw a ScanYieldException every other key
    + */
    +public class YieldingIterator extends WrappingIterator {
    +  private static final Logger log = LoggerFactory.getLogger(YieldingIterator.class);
    +  private static final AtomicInteger yieldNexts = new AtomicInteger(0);
    +  private static final AtomicInteger yieldSeeks = new AtomicInteger(0);
    +  private static final AtomicInteger rebuilds = new AtomicInteger(0);
    +
    +  private static final AtomicBoolean yieldNextKey = new AtomicBoolean(false);
    +  private static final AtomicBoolean yieldSeekKey = new AtomicBoolean(false);
    +
    +  @Override
    +  public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
    +    YieldingIterator it = new YieldingIterator();
    +    it.setSource(it.getSource().deepCopy(env));
    +    return it;
    +  }
    +
    +  @Override
    +  public void next() throws IOException {
    +    log.info("start YieldingIterator.next: " + getTopValue());
    +    yieldNextKey.set(!yieldNextKey.get());
    +    if (yieldNextKey.get()) {
    +      log.info("YieldingIterator.next yielding: " + getTopValue());
    +      yieldNexts.incrementAndGet();
    +      throw new ScanYieldException(getTopKey());
    +    }
    +    super.next();
    +    log.info("end YieldingIterator.next: " + (hasTop() ? getTopKey() + " " + getTopValue() : "no top"));
    +  }
    +
    +  @Override
    +  public Value getTopValue() {
    +    String value = Integer.toString(yieldNexts.get()) + ',' + Integer.toString(yieldSeeks.get()) + ',' + Integer.toString(rebuilds.get());
    --- End diff --
    
    sure



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119142159
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/user/ScanYieldException.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators.user;
    +
    +import org.apache.accumulo.core.data.Key;
    +
    +/**
    + * This exception can be thrown at from a next or seek call on an iterator to allow other scans to get time slots. This mechanism is intended to avoid a set of
    + * scans to dominate all of the scan slots (readahead threads) and starve other scans out.
    + */
    +public class ScanYieldException extends RuntimeException {
    +  private final Key position;
    +
    +  public ScanYieldException(Key position) {
    --- End diff --
    
    the seek() would pass in the key at which it left off (somewhere past the start of the range)
    the next() would also pass in the key at which it left which should not be the last key returned by the iterator.
    
    It appears that I have not adequately described the process here.  Here is a scenario that might help understanding:
    
    Lets say we have a tablet with keys a, b, c, ... z.
    Someone writes an iterator which it to only return keys b, and z.
    Seek is called with a range of (-inf, inf).
    The iterator starts scanning and finds key b and returns normally.
    The tablet calls hasTop-> true and calls getTopKey() which returns key b.
    The tablet then calls next.
    The Iterator next call continues scanning and gets to key m and wants to return control to the tablet so it throws the ScanYieldException setting the key to m
    The tablet comes back sometime later and calls seek with a range of (m, inf).
    The iterator seek call continues scanning the underlying keys and gets to key u and wants to return control to the tablet so it throws a ScanYieldException setting the key to u
    The tablet comes back sometime later and calls seek with a range of (u, inf)
    The iterator seek calls continues scanning and gets to key z and returns normally.
    The tablet calls hasTop-> true and calls getTopKey() which returns key z.
    The tablet calls next()
    Next returns immediately as there are no more keys
    The table calls hasTop -> false and the scan is complete.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119136681
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -715,47 +722,54 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
     
         long maxResultsSize = tableConfiguration.getAsBytes(Property.TABLE_SCAN_MAXMEM);
     
    -    if (columns.size() == 0) {
    -      iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    -    } else {
    -      iter.seek(range, LocalityGroupUtil.families(columns), true);
    -    }
    -
         Key continueKey = null;
         boolean skipContinueKey = false;
     
         boolean endOfTabletReached = false;
    -    while (iter.hasTop()) {
     
    -      value = iter.getTopValue();
    -      key = iter.getTopKey();
    +    try {
    +      if (columns.size() == 0) {
    +        iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    --- End diff --
    
    Not too sure I am following you here.  Lets say that seek gets called with a range.  The underlying iterator would presumably seek its source with the same range.  Now the seek is iterating through keys past the initial start of the range and realizes at some point that it has been passing by many of the underlying keys for whatever reason without returning control to the Tablet.  Now it throws a ScanYieldException specifying where it left off in the seek.  Later the Tablet will come back and rebuild the iterator and seek it using the new start as specified.  The seek can now continue doing its job until it actually finds a key that is to be returned.  Where in this scenario is a key missed?  The whole point of throwing the exception is that we have not yet found a key to return to the user.  There is no missed data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r118568041
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/YieldScannersIT.java ---
    @@ -0,0 +1,71 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test;
    +
    +import com.google.common.collect.Iterators;
    +import org.apache.accumulo.core.client.Connector;
    +import org.apache.accumulo.core.client.IteratorSetting;
    +import org.apache.accumulo.core.client.Scanner;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.security.Authorizations;
    +import org.apache.accumulo.harness.AccumuloClusterHarness;
    +import org.apache.accumulo.minicluster.impl.MiniAccumuloConfigImpl;
    +import org.apache.accumulo.test.functional.YieldingIterator;
    +import org.apache.hadoop.conf.Configuration;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import java.util.Iterator;
    +import java.util.Map;
    +
    +// ACCUMULO-4643
    +public class YieldScannersIT extends AccumuloClusterHarness {
    +
    +  @Override
    +  public int defaultTimeoutSeconds() {
    +    return 600000;
    +  }
    +
    +  @Override
    +  public void configureMiniCluster(MiniAccumuloConfigImpl cfg, Configuration hadoopCoreSite) {
    +    cfg.setNumTservers(1);
    +  }
    +
    +  @Test
    +  public void test() throws Exception {
    +    // make a table
    +    final String tableName = getUniqueNames(1)[0];
    +    final Connector conn = getConnector();
    +    conn.tableOperations().create(tableName);
    +
    +    // make a scanner for a table with 10 keys
    +    final Scanner scanner = conn.createScanner(tableName, Authorizations.EMPTY);
    +    final IteratorSetting cfg = new IteratorSetting(100, YieldingIterator.class);
    +    cfg.addOption(YieldingIterator.NUMBER_OF_KEYS, "10");
    --- End diff --
    
    What about just writing 10 entries to the table and then using your iterator to read them all.
    
    IIUC, the client should see no difference with this change in place (just the number of RPCs changes).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119115828
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -586,6 +589,10 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
               mmfi.next();
             }
     
    +      } catch (ScanYieldException sye) {
    +        log.debug("Scan yield exception detected at position " + sye.getPosition());
    +        addUnfinishedRange(lookupResult, range, sye.getPosition(), false);
    --- End diff --
    
    What happens to data which is buffered to be sent back to the client, but has not yet been sent? (the buffer controlled by the tserver scan max memory property). I think the client would see this data, but I wanted to dbl check with you as you're more familiar than I am now :)
    
    Specifically, say we have a buffer which can hold 500 Key-Values. An Iterator gets through 250 K-V's, and then throws this exception. We mark the `Range` as unfinished, but does the client see those 250 K-V's that were produced by the Iterator?
    
    It seems like `addUnfinishedRange(..)` is then determining for us if we need to do any more work on the current range (e.g. handling the case where we "timed out" after computing the last k-v for the Range).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119992092
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/SourceSwitchingIterator.java ---
    @@ -126,14 +142,23 @@ public void next() throws IOException {
     
       private void readNext(boolean initialSeek) throws IOException {
     
    +    // we need to check here if we were yielded in case the source was switched out and re-seeked by someone else (minor compaction/InMemoryMap)
    +    boolean yielded = (yield.isPresent() && yield.get().hasYielded());
    +
         // check of initialSeek second is intentional so that it does not short
         // circuit the call to switchSource
    -    boolean seekNeeded = (!onlySwitchAfterRow && switchSource()) || initialSeek;
    +    boolean seekNeeded = yielded || (!onlySwitchAfterRow && switchSource()) || initialSeek;
     
         if (seekNeeded)
           if (initialSeek)
             iter.seek(range, columnFamilies, inclusive);
    -      else
    +      else if (yielded) {
    +        Key yieldPosition = yield.get().getPositionAndReset();
    +        if (!range.contains(yieldPosition)) {
    +          throw new IOException("Underlying iterator yielded to a position outside of its range: " + yieldPosition + " not in " + range);
    --- End diff --
    
    NextBatchTask.run()
        } catch (Throwable e) {
          log.warn("exception while scanning tablet " + (scanSession == null ? "(unknown)" : scanSession.extent), e);
          addResult(e);
        }
    or 
    Tablet.lookup:
        } catch (IOException ioe) {
          dataSource.close(true);
          throw ioe;
        }
    
    Both of those cases should result in an exception client side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by dlmarion <gi...@git.apache.org>.
Github user dlmarion commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119597061
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -586,6 +591,11 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
               mmfi.next();
             }
     
    +        if (yield.hasYielded()) {
    +          Key yieldPosition = yield.getPositionAndReset();
    +          log.debug("Scan yield exception detected at position " + yieldPosition);
    --- End diff --
    
    Error message is probably from earlier version of code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119116912
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/functional/YieldingIterator.java ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test.functional;
    +
    +import java.util.concurrent.atomic.AtomicBoolean;
    +import org.apache.accumulo.core.data.ByteSequence;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Range;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.iterators.IteratorEnvironment;
    +import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
    +import org.apache.accumulo.core.iterators.WrappingIterator;
    +import org.apache.accumulo.core.iterators.user.ScanYieldException;
    +
    +import java.io.IOException;
    +import java.util.Collection;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/**
    + * This iterator will throw a ScanYieldException every other key
    + */
    +public class YieldingIterator extends WrappingIterator {
    +  private static final Logger log = LoggerFactory.getLogger(YieldingIterator.class);
    +  private static final AtomicInteger yieldNexts = new AtomicInteger(0);
    +  private static final AtomicInteger yieldSeeks = new AtomicInteger(0);
    +  private static final AtomicInteger rebuilds = new AtomicInteger(0);
    +
    +  private static final AtomicBoolean yieldNextKey = new AtomicBoolean(false);
    +  private static final AtomicBoolean yieldSeekKey = new AtomicBoolean(false);
    +
    +  @Override
    +  public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
    +    YieldingIterator it = new YieldingIterator();
    +    it.setSource(it.getSource().deepCopy(env));
    +    return it;
    +  }
    +
    +  @Override
    +  public void next() throws IOException {
    +    log.info("start YieldingIterator.next: " + getTopValue());
    +    yieldNextKey.set(!yieldNextKey.get());
    +    if (yieldNextKey.get()) {
    +      log.info("YieldingIterator.next yielding: " + getTopValue());
    +      yieldNexts.incrementAndGet();
    +      throw new ScanYieldException(getTopKey());
    +    }
    +    super.next();
    +    log.info("end YieldingIterator.next: " + (hasTop() ? getTopKey() + " " + getTopValue() : "no top"));
    +  }
    +
    +  @Override
    +  public Value getTopValue() {
    +    String value = Integer.toString(yieldNexts.get()) + ',' + Integer.toString(yieldSeeks.get()) + ',' + Integer.toString(rebuilds.get());
    --- End diff --
    
    Tricky :)
    
    Could you add a quick sentence to the class-level javadoc describing this? Would help when the next person comes along to this Iterator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119618591
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +764,22 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      log.debug("Scan yield exception detected at position " + continueKey);
    --- End diff --
    
    thank you


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119992401
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +774,29 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      skipContinueKey = true;
    +      if (!range.contains(continueKey)) {
    +        throw new IOException("Underlying iterator yielded to a position outside of its range: " + continueKey + " not in " + range);
    +      }
     
    -    if (endOfTabletReached) {
    -      continueKey = null;
    -    }
    +      log.debug("Scan yield detected at position " + continueKey);
    --- End diff --
    
    Not sure I agree.  Yielding should be a relatively infrequent occurence.  I, as an operator, want to know whether this has been happening on a tserver or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119164669
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -715,47 +722,54 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
     
         long maxResultsSize = tableConfiguration.getAsBytes(Property.TABLE_SCAN_MAXMEM);
     
    -    if (columns.size() == 0) {
    -      iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    -    } else {
    -      iter.seek(range, LocalityGroupUtil.families(columns), true);
    -    }
    -
         Key continueKey = null;
         boolean skipContinueKey = false;
     
         boolean endOfTabletReached = false;
    -    while (iter.hasTop()) {
     
    -      value = iter.getTopValue();
    -      key = iter.getTopKey();
    +    try {
    +      if (columns.size() == 0) {
    +        iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    --- End diff --
    
    True, if the implementer of the iterator does not return an appropriate place to continue in the exception then the iterator will not work correctly.  If the implementer returns the start of the initial range in the exception thrown by the seek, then I expect the iterator will never actually do anything as it will also be restarting from the beginning of the range and then (if deterministically implemented) will throw an exception at the same point with the same start every single time.
    I will try to update the documentation to be more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r118567373
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/user/ScanYieldException.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators.user;
    +
    +import org.apache.accumulo.core.data.Key;
    +
    +/**
    + * This exception can be thrown at from a next or seek call on an iterator to allow other scans to get time slots. This mechanism is intended to avoid a set of
    --- End diff --
    
    Let's also add a note to `SortedKeyValueIterator#next()` and `SortedKeyValueIterator#seek(...)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119897094
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +774,29 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      skipContinueKey = true;
    +      if (!range.contains(continueKey)) {
    +        throw new IOException("Underlying iterator yielded to a position outside of its range: " + continueKey + " not in " + range);
    +      }
     
    -    if (endOfTabletReached) {
    -      continueKey = null;
    -    }
    +      log.debug("Scan yield detected at position " + continueKey);
    --- End diff --
    
    Same here about trace instead of debug. This wouldn't help an operator figure out what's going on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    > What does this mean?
    
    When I was writing a yielding filter, when I set the yield key I was not sure if it would continue at that key or after it.
    
    > Are you suggesting another jira ticket here? I have not changed any of the scan scheduling mechanism.
    
    No, it was just something I observed and didn't fully understand.  I am going to look into it some more today.  I was expecting the CPU utilization to be 100% in this case, because there was plenty of work to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by phrocker <gi...@git.apache.org>.
Github user phrocker commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @ivakegg  Yields are not aware of other yields and thus are completely independent and thus cannot cooperate with any scheduling mechanism.  My old Operating System book calls this "uncooperative yielding." But I can see how this can be confusing. Let's call it isolated yielding.
    
    To your point that "they do it to themselves." Well, since an iterator is one amongst a stack and you could have a multi-user system, if you had one iterator that would skip just five more keys before completing, but is pre-empted due to another iterator, you have the potential for a yield when one is not desired. The only way to combat this would be solid metrics. You don't know how many increased RPC calls there are. This can increase RPCs if you simply set the key yield incorrectly.  You don't know I/O load and how many keys being skipped is reasonable without these metrics. Further, one key is not the same as another key. Parts of a table could have much smaller keys, so again, these metrics prove everything by telling us: how much time spent before yield, size of keys skipped, etc, etc
    
     Hence those metrics would be useful to show if this mechanism works as intended in production. 
    
    Then, after metrics, a nice to have would be a mechanism that allows the entire scan to stop. If you are going to put a limit and "yield." You must have a cessation point. Agree that long running scans can happen, but the RPC increase and context switching is a problem that we cannot stop with the current solution. You also need a point at which you have yielded enough and thus must sop entirely. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    I shall rebase and rewrite the message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    I like YieldCallback, its a lot cleaner than the all of the method on SKVI in the previous iteration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119991603
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    +  private K key;
    +
    +  /**
    +   * Called by the iterator when a next or seek call yields control.
    +   *
    +   * @param key
    +   *          the key position at which the iterator yielded.
    +   */
    +  public void yield(K key) {
    +    this.key = key;
    +  }
    +
    +  /**
    +   * Called by the client to see if the iterator yielded
    +   *
    +   * @return true if iterator yielded control
    +   */
    +  public boolean hasYielded() {
    +    return (this.key != null);
    +  }
    +
    +  /**
    +   * Called by the client to get the yield position used as the start key (non-inclusive) of the range in a subsequent seek call when the iterator is rebuilt.
    +   * This will also reset the state returned by hasYielded.
    +   *
    +   * @return <tt>K</tt> The key position
    +   */
    +  public K getPositionAndReset() {
    --- End diff --
    
    These would not be invoked across threads, however I can state that clearly.  Synchronization would be wasted overhead here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    > We are in the process of getting this put onto a larger cluster to test its efficacy.
    
    Awesome :D. Should I take that as avoiding merging this until that testing is complete?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119895340
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/SourceSwitchingIterator.java ---
    @@ -126,14 +142,23 @@ public void next() throws IOException {
     
       private void readNext(boolean initialSeek) throws IOException {
     
    +    // we need to check here if we were yielded in case the source was switched out and re-seeked by someone else (minor compaction/InMemoryMap)
    +    boolean yielded = (yield.isPresent() && yield.get().hasYielded());
    +
         // check of initialSeek second is intentional so that it does not short
         // circuit the call to switchSource
    -    boolean seekNeeded = (!onlySwitchAfterRow && switchSource()) || initialSeek;
    +    boolean seekNeeded = yielded || (!onlySwitchAfterRow && switchSource()) || initialSeek;
     
         if (seekNeeded)
           if (initialSeek)
             iter.seek(range, columnFamilies, inclusive);
    -      else
    +      else if (yielded) {
    +        Key yieldPosition = yield.get().getPositionAndReset();
    +        if (!range.contains(yieldPosition)) {
    +          throw new IOException("Underlying iterator yielded to a position outside of its range: " + yieldPosition + " not in " + range);
    --- End diff --
    
    How does the client handle this IOException being thrown? I'm hoping that it would propagate as a ThriftApplicationException server-side and an AccumuloException client side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r118574720
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/YieldScannersIT.java ---
    @@ -0,0 +1,71 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test;
    +
    +import com.google.common.collect.Iterators;
    +import org.apache.accumulo.core.client.Connector;
    +import org.apache.accumulo.core.client.IteratorSetting;
    +import org.apache.accumulo.core.client.Scanner;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.security.Authorizations;
    +import org.apache.accumulo.harness.AccumuloClusterHarness;
    +import org.apache.accumulo.minicluster.impl.MiniAccumuloConfigImpl;
    +import org.apache.accumulo.test.functional.YieldingIterator;
    +import org.apache.hadoop.conf.Configuration;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import java.util.Iterator;
    +import java.util.Map;
    +
    +// ACCUMULO-4643
    +public class YieldScannersIT extends AccumuloClusterHarness {
    +
    +  @Override
    +  public int defaultTimeoutSeconds() {
    +    return 600000;
    +  }
    +
    +  @Override
    +  public void configureMiniCluster(MiniAccumuloConfigImpl cfg, Configuration hadoopCoreSite) {
    +    cfg.setNumTservers(1);
    +  }
    +
    +  @Test
    +  public void test() throws Exception {
    +    // make a table
    +    final String tableName = getUniqueNames(1)[0];
    +    final Connector conn = getConnector();
    +    conn.tableOperations().create(tableName);
    +
    +    // make a scanner for a table with 10 keys
    +    final Scanner scanner = conn.createScanner(tableName, Authorizations.EMPTY);
    +    final IteratorSetting cfg = new IteratorSetting(100, YieldingIterator.class);
    +    cfg.addOption(YieldingIterator.NUMBER_OF_KEYS, "10");
    --- End diff --
    
    I could write the entries to the table, but the iterator I am using simulates that just fine.  I will adjust the test case anyway....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120022417
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    +  private K key;
    +
    +  /**
    +   * Called by the iterator when a next or seek call yields control.
    +   *
    +   * @param key
    +   *          the key position at which the iterator yielded.
    +   */
    +  public void yield(K key) {
    +    this.key = key;
    +  }
    +
    +  /**
    +   * Called by the client to see if the iterator yielded
    +   *
    +   * @return true if iterator yielded control
    +   */
    +  public boolean hasYielded() {
    +    return (this.key != null);
    --- End diff --
    
    I'm not sure if you think you're being funny or something, but it's not funny to me. I don't appreciate what feels like mocking when I'm making the time to review your code.
    
    In case the meaning of the phrase is lost: "nit" as in "nit-picky" -- a minor thing that can be improved/fixed. However, in this case because I'm now riled up, I don't understand how you think there is _any_ ambiguity with `return this.key != null` what the parentheses are providing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @joshelser @phrocker Exactly what metrics are we looking for here?  Are we looking for simply how many times we see a scan yielding?  Are we looking for the time between the beginning of the next/seek call and the yield?  I don't see anything that times next and seek calls now.....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119895044
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/SourceSwitchingIterator.java ---
    @@ -126,14 +142,23 @@ public void next() throws IOException {
     
       private void readNext(boolean initialSeek) throws IOException {
     
    +    // we need to check here if we were yielded in case the source was switched out and re-seeked by someone else (minor compaction/InMemoryMap)
    +    boolean yielded = (yield.isPresent() && yield.get().hasYielded());
    --- End diff --
    
    nit: parens are unnecessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119149568
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/user/ScanYieldException.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators.user;
    +
    +import org.apache.accumulo.core.data.Key;
    +
    +/**
    + * This exception can be thrown at from a next or seek call on an iterator to allow other scans to get time slots. This mechanism is intended to avoid a set of
    + * scans to dominate all of the scan slots (readahead threads) and starve other scans out.
    + */
    +public class ScanYieldException extends RuntimeException {
    +  private final Key position;
    +
    +  public ScanYieldException(Key position) {
    --- End diff --
    
    I was attempting to get you past the incorrect statement of "For seek(), the start key of the Range passed to the Iterator".  The key will not be the start key of the range passed to the iterator.  Instead it will be the key where the seek left off in the underlying scan such that the next seek call will let it continue where it left off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by dlmarion <gi...@git.apache.org>.
Github user dlmarion commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @joshelser Is there an example of creating a new metric, or is it still all JMX based?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    > Exactly what metrics are we looking for here?
    
    A trivial counter for the number of times a Scanner yield'ed is what I was thinking. That will let us measure it over time (Metrics2's "windowing" stuff going on in the background). There is a "MutableCounter" which you can instantiate from the MetricsRegistry object in Metrics2TabletServerScanMetrics which should precisely what you need (I really am not be facetious/sarcastic implying that this should be quite simple)
    
    > Are we looking for simply how many times we see a scan yielding? Are we looking for the time between the beginning of the next/seek call and the yield? I don't see anything that times next and seek calls now.....
    
    Yeah scan metrics are extremely "light" at present. Not expecting any extreme level of effort here. Yet another back-burner task of mine :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120022721
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +774,29 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      skipContinueKey = true;
    +      if (!range.contains(continueKey)) {
    +        throw new IOException("Underlying iterator yielded to a position outside of its range: " + continueKey + " not in " + range);
    +      }
     
    -    if (endOfTabletReached) {
    -      continueKey = null;
    -    }
    +      log.debug("Scan yield detected at position " + continueKey);
    --- End diff --
    
    The problem is that the metrics only give me a count.  The log statement tells me where it was yielding.  However I can certainly put logging in my implemented iterator and make this a trace statement here.  I yield to your point ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    OK, I am going to attempt to merge this in tomorrow unless there are any more comments.  This includes the 2.0.0 pull request 260.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by phrocker <gi...@git.apache.org>.
Github user phrocker commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119114193
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/functional/YieldingIterator.java ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test.functional;
    +
    +import java.util.concurrent.atomic.AtomicBoolean;
    +import org.apache.accumulo.core.data.ByteSequence;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Range;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.iterators.IteratorEnvironment;
    +import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
    +import org.apache.accumulo.core.iterators.WrappingIterator;
    +import org.apache.accumulo.core.iterators.user.ScanYieldException;
    +
    +import java.io.IOException;
    +import java.util.Collection;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/**
    + * This iterator will throw a ScanYieldException every other key
    + */
    +public class YieldingIterator extends WrappingIterator {
    +  private static final Logger log = LoggerFactory.getLogger(YieldingIterator.class);
    +  private static final AtomicInteger yieldNexts = new AtomicInteger(0);
    +  private static final AtomicInteger yieldSeeks = new AtomicInteger(0);
    +  private static final AtomicInteger rebuilds = new AtomicInteger(0);
    +
    +  private static final AtomicBoolean yieldNextKey = new AtomicBoolean(false);
    +  private static final AtomicBoolean yieldSeekKey = new AtomicBoolean(false);
    +
    +  @Override
    +  public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
    +    YieldingIterator it = new YieldingIterator();
    +    it.setSource(it.getSource().deepCopy(env));
    +    return it;
    +  }
    +
    +  @Override
    +  public void next() throws IOException {
    +    log.info("start YieldingIterator.next: " + getTopValue());
    +    yieldNextKey.set(!yieldNextKey.get());
    +    if (yieldNextKey.get()) {
    +      log.info("YieldingIterator.next yielding: " + getTopValue());
    +      yieldNexts.incrementAndGet();
    +      throw new ScanYieldException(getTopKey());
    +    }
    +    super.next();
    +    log.info("end YieldingIterator.next: " + (hasTop() ? getTopKey() + " " + getTopValue() : "no top"));
    +  }
    +
    +  @Override
    +  public Value getTopValue() {
    +    String value = Integer.toString(yieldNexts.get()) + ',' + Integer.toString(yieldSeeks.get()) + ',' + Integer.toString(rebuilds.get());
    --- End diff --
    
    This can only be used where values are not necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119991613
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/SourceSwitchingIterator.java ---
    @@ -126,14 +142,23 @@ public void next() throws IOException {
     
       private void readNext(boolean initialSeek) throws IOException {
     
    +    // we need to check here if we were yielded in case the source was switched out and re-seeked by someone else (minor compaction/InMemoryMap)
    +    boolean yielded = (yield.isPresent() && yield.get().hasYielded());
    --- End diff --
    
    nit: parens make this clearer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    OK, I added metrics for the yield counts.  Please tell me if I missed anything there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r118574189
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -548,11 +549,13 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
           batchTimeOut = 0;
         }
     
    +    boolean yield = false;
    +
         for (Range range : ranges) {
     
           boolean timesUp = batchTimeOut > 0 && System.nanoTime() > returnTime;
     
    -      if (exceededMemoryUsage || tabletClosed || timesUp) {
    +      if (exceededMemoryUsage || tabletClosed || timesUp || yield) {
             lookupResult.unfinishedRanges.add(range);
             continue;
    --- End diff --
    
    yes, it interrupts only the range currently being scanned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    The root cause as far as I can tell is:
    Caused by: java.lang.NoSuchFieldError: DEFAULT_USER_SETTINGS_FILE


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119149746
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -715,47 +722,54 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
     
         long maxResultsSize = tableConfiguration.getAsBytes(Property.TABLE_SCAN_MAXMEM);
     
    -    if (columns.size() == 0) {
    -      iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    -    } else {
    -      iter.seek(range, LocalityGroupUtil.families(columns), true);
    -    }
    -
         Key continueKey = null;
         boolean skipContinueKey = false;
     
         boolean endOfTabletReached = false;
    -    while (iter.hasTop()) {
     
    -      value = iter.getTopValue();
    -      key = iter.getTopKey();
    +    try {
    +      if (columns.size() == 0) {
    +        iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    --- End diff --
    
    The key placed in the ScanYieldException is not the start of the initial range.  It is the key where it left off in the underlying scan.  I am sure this is not an issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119900221
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/SortedKeyValueIterator.java ---
    @@ -147,4 +147,16 @@
        *              if not supported.
        */
       SortedKeyValueIterator<K,V> deepCopy(IteratorEnvironment env);
    +
    +  /**
    +   * Called when the client (i.e. calling iterator or the tserver) will support this iterators ability to yield. If called, then when next or seek is called
    --- End diff --
    
    Javadoc is best written in third-person, declarative (and should avoid using they word to describe the word being defined), e.g. "Allows implementations to preempt further iteration of this iterator in the current RPC. Implementations can use the `yield` method on the `callback` to instruct the caller to cease collecting more results. The yield method accepts a `Key` which is a hint to the caller to resume the scan without returning more data; that is, the `Key` set should be the last Key processed by this Iterator, either internally or externally that would safely enable this iterator to not return duplicate data or omit data. This feature is not supported for isolated scans. Most iterators do not need to implement this method."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119116363
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/YieldScannersIT.java ---
    @@ -0,0 +1,106 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test;
    +
    +import org.apache.accumulo.core.client.BatchWriter;
    +import org.apache.accumulo.core.client.BatchWriterConfig;
    +import org.apache.accumulo.core.client.Connector;
    +import org.apache.accumulo.core.client.IteratorSetting;
    +import org.apache.accumulo.core.client.Scanner;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Mutation;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.security.Authorizations;
    +import org.apache.accumulo.harness.AccumuloClusterHarness;
    +import org.apache.accumulo.minicluster.impl.MiniAccumuloConfigImpl;
    +import org.apache.accumulo.test.functional.YieldingIterator;
    +import org.apache.commons.lang.StringUtils;
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.io.Text;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import java.util.Iterator;
    +import java.util.Map;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +// ACCUMULO-4643
    +public class YieldScannersIT extends AccumuloClusterHarness {
    +  Logger log = LoggerFactory.getLogger(YieldScannersIT.class);
    +
    +  @Override
    +  public int defaultTimeoutSeconds() {
    +    return 60;
    +  }
    +
    +  @Override
    +  public void configureMiniCluster(MiniAccumuloConfigImpl cfg, Configuration hadoopCoreSite) {
    +    cfg.setNumTservers(1);
    +  }
    +
    +  @Test
    +  public void test() throws Exception {
    +    // make a table
    +    final String tableName = getUniqueNames(1)[0];
    +    final Connector conn = getConnector();
    +    conn.tableOperations().create(tableName);
    +    final BatchWriter writer = conn.createBatchWriter(tableName, new BatchWriterConfig());
    +    for (int i = 0; i < 10; i++) {
    +      byte[] row = new byte[] {(byte) ('a' + i)};
    +      Mutation m = new Mutation(new Text(row));
    +      m.put(new Text(), new Text(), new Value());
    +      writer.addMutation(m);
    +    }
    +    writer.flush();
    +    writer.close();
    +
    +    log.info("Creating scanner");
    +    // make a scanner for a table with 10 keys
    +    final Scanner scanner = conn.createScanner(tableName, Authorizations.EMPTY);
    +    final IteratorSetting cfg = new IteratorSetting(100, YieldingIterator.class);
    +    scanner.addScanIterator(cfg);
    +
    +    log.info("iterating");
    +    Iterator<Map.Entry<Key,Value>> it = scanner.iterator();
    +    int keyCount = 0;
    +    int yieldNextCount = 0;
    +    int yieldSeekCount = 0;
    +    while (it.hasNext()) {
    +      Map.Entry<Key,Value> next = it.next();
    +      log.info(Integer.toString(keyCount) + ": Got key " + next.getKey() + " with value " + next.getValue());
    +
    +      // verify we got the expected key
    +      char expected = (char) ('a' + keyCount);
    --- End diff --
    
    How about we just use the code point for the lowercase letter 'a' instead of an actual `'a'`? Took me a moment to realize what you were doing :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120149481
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    --- End diff --
    
    @ivakegg for the case of a transforming iterator that supports yielding, are you thinking it would it work like the following?
    
    ```java
    class MyIter implements SKVI {
      YieldCallback yc;
    
      void next() {
         source.next();
         if(yc.hasYielded()) {
           Key tmpKey = yc.getPositionAndReset();
           tmpKey = transform(tmpKey);
           yc.yield(tmpKey);
         }
      }
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by dlmarion <gi...@git.apache.org>.
Github user dlmarion commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120024937
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +774,29 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      skipContinueKey = true;
    +      if (!range.contains(continueKey)) {
    +        throw new IOException("Underlying iterator yielded to a position outside of its range: " + continueKey + " not in " + range);
    +      }
     
    -    if (endOfTabletReached) {
    -      continueKey = null;
    -    }
    +      log.debug("Scan yield detected at position " + continueKey);
    --- End diff --
    
    >  If you want to know how often yielding is happening, you should look at the metrics you implemented and not place extra cruft in the log to just be removed in every other situation.
    
    Do you happen to know if the Hadoop Metrics2 framework picks up configuration changes at runtime like the logging subsystem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119897281
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +774,29 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      skipContinueKey = true;
    +      if (!range.contains(continueKey)) {
    +        throw new IOException("Underlying iterator yielded to a position outside of its range: " + continueKey + " not in " + range);
    +      }
     
    -    if (endOfTabletReached) {
    -      continueKey = null;
    -    }
    +      log.debug("Scan yield detected at position " + continueKey);
    +      Metrics scanMetrics = getTabletServer().getScanMetrics();
    +      if (scanMetrics.isEnabled())
    +        scanMetrics.add(TabletServerScanMetrics.YIELD, 1);
    +    } else {
    +      if (iter.hasTop() == false) {
    +        endOfTabletReached = true;
    +      }
     
    -    if (endOfTabletReached && results.size() == 0)
    -      results = null;
    +      if (endOfTabletReached) {
    +        continueKey = null;
    +      }
    +
    +      if (endOfTabletReached && results.size() == 0)
    --- End diff --
    
    Probably of no consequence, but you could nest the conditionals and avoid checking `endOfTabletReached` two times.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120023630
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +774,29 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      skipContinueKey = true;
    +      if (!range.contains(continueKey)) {
    +        throw new IOException("Underlying iterator yielded to a position outside of its range: " + continueKey + " not in " + range);
    +      }
     
    -    if (endOfTabletReached) {
    -      continueKey = null;
    -    }
    +      log.debug("Scan yield detected at position " + continueKey);
    --- End diff --
    
    I'd appreciate that. As a suggestion: if there are multiple areas in which you want to understand the "kinds" of yielding happening: just expose some more metrics. Have a total, and some specific breakouts of the different lines (i guess initial seek and next calls you're trying to differentiate?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by dlmarion <gi...@git.apache.org>.
Github user dlmarion commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119692389
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -586,6 +591,11 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
               mmfi.next();
             }
     
    +        if (yield.hasYielded()) {
    +          Key yieldPosition = yield.getPositionAndReset();
    --- End diff --
    
    We should probably ensure that an iterator is not returning the same yield key. Suggest ensuring that the yieldPosition is within the scan range.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120399218
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -565,7 +572,7 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
             else
               mmfi.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
     
    -        while (mmfi.hasTop()) {
    +        while (!yield.hasYielded() && mmfi.hasTop()) {
    --- End diff --
    
    I was thinking it would be cleaner if the contract was : *when an iterator sets a yield key that its hasTop() method should start returning false*
    
    I think has the following benefits :
      * No requirement for intermediate iterator (that push the YieldCallback down) to check  yield.hasYielded().  I think makes it easier for simple iterators to support pushing the YieldCallback down.  If an iterator cares, it can still check.
      * May be more efficient to only check `hasTop()` instead of always checking `yield.hasYielded()` and `hasTop()`
    
    When `hasTop()` returns false, then the top level loop can check `yield.hasYielded()` as it currently does.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    Not that I know of, but that's based off of some old knowledge.
    
    On Jun 4, 2017 20:02, "Dave Marion" <no...@github.com> wrote:
    
    > *@dlmarion* commented on this pull request.
    > ------------------------------
    >
    > In server/tserver/src/main/java/org/apache/accumulo/tserver/
    > tablet/Tablet.java
    > <https://github.com/apache/accumulo/pull/260#discussion_r120024937>:
    >
    > >
    > -    if (endOfTabletReached) {
    > -      continueKey = null;
    > -    }
    > +      log.debug("Scan yield detected at position " + continueKey);
    >
    > If you want to know how often yielding is happening, you should look at
    > the metrics you implemented and not place extra cruft in the log to just be
    > removed in every other situation.
    >
    > Do you happen to know if the Hadoop Metrics2 framework picks up
    > configuration changes at runtime like the logging subsystem?
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/accumulo/pull/260#discussion_r120024937>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AAEJF92J2IBck7X-_6BqIlcTG0Lq1jS_ks5sA0WIgaJpZM4Nmu4X>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120022428
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/SourceSwitchingIterator.java ---
    @@ -126,14 +142,23 @@ public void next() throws IOException {
     
       private void readNext(boolean initialSeek) throws IOException {
     
    +    // we need to check here if we were yielded in case the source was switched out and re-seeked by someone else (minor compaction/InMemoryMap)
    +    boolean yielded = (yield.isPresent() && yield.get().hasYielded());
    +
         // check of initialSeek second is intentional so that it does not short
         // circuit the call to switchSource
    -    boolean seekNeeded = (!onlySwitchAfterRow && switchSource()) || initialSeek;
    +    boolean seekNeeded = yielded || (!onlySwitchAfterRow && switchSource()) || initialSeek;
     
         if (seekNeeded)
           if (initialSeek)
             iter.seek(range, columnFamilies, inclusive);
    -      else
    +      else if (yielded) {
    +        Key yieldPosition = yield.get().getPositionAndReset();
    +        if (!range.contains(yieldPosition)) {
    +          throw new IOException("Underlying iterator yielded to a position outside of its range: " + yieldPosition + " not in " + range);
    --- End diff --
    
    Great, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120162160
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    --- End diff --
    
    Transform could possibly be handled in a push down manner also.  This could be made easier with a utility method.  Just sharing thoughts I had on this for discussion, not saying this is something we should do.
    
    ```java
    class YieldCallback {
      /**
       * Returns a new YieldCallback that wraps the passed in a transforms the key when yeild is called.
       */
      YieldCallback transform(YieldCallback yc, Function<Key, Key> keyTransform);
    }
    ```
    
    ```java
    class MyIter implements SKVI {
      @Override
      void enableYielding(YieldCallback yc) {
        source.enableYielding(YieldCallback.transform(yc, k -> transform(k));
      }
    }
    
    ``` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    > I believe the Jenkins server is borked somehow. It has failed to build for a while and the reason appears completely independent of this pull request.
    
    Ugh, nothing is easy :)
    
    > OK, I added metrics for the yield counts. Please tell me if I missed anything there.
    
    Cool. I gave a quick glance and only had one minor thought.
    
    I assume that you have something locally that you've used to test these changes in a real installation (even if just one node)? I'll try to give the final patch a glance. If Dave is already happy with it, that's cool too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r118574368
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/YieldScannersIT.java ---
    @@ -0,0 +1,71 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test;
    +
    +import com.google.common.collect.Iterators;
    +import org.apache.accumulo.core.client.Connector;
    +import org.apache.accumulo.core.client.IteratorSetting;
    +import org.apache.accumulo.core.client.Scanner;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.security.Authorizations;
    +import org.apache.accumulo.harness.AccumuloClusterHarness;
    +import org.apache.accumulo.minicluster.impl.MiniAccumuloConfigImpl;
    +import org.apache.accumulo.test.functional.YieldingIterator;
    +import org.apache.hadoop.conf.Configuration;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import java.util.Iterator;
    +import java.util.Map;
    +
    +// ACCUMULO-4643
    +public class YieldScannersIT extends AccumuloClusterHarness {
    +
    +  @Override
    +  public int defaultTimeoutSeconds() {
    +    return 600000;
    --- End diff --
    
    right, I was having issues with Mini Accumulo on my box.  I will set that appropriately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119168100
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -715,47 +722,54 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
     
         long maxResultsSize = tableConfiguration.getAsBytes(Property.TABLE_SCAN_MAXMEM);
     
    -    if (columns.size() == 0) {
    -      iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    -    } else {
    -      iter.seek(range, LocalityGroupUtil.families(columns), true);
    -    }
    -
         Key continueKey = null;
         boolean skipContinueKey = false;
     
         boolean endOfTabletReached = false;
    -    while (iter.hasTop()) {
     
    -      value = iter.getTopValue();
    -      key = iter.getTopKey();
    +    try {
    +      if (columns.size() == 0) {
    +        iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    --- End diff --
    
    Ok, thanks, Ivan!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/accumulo/pull/260


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119151013
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/user/ScanYieldException.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators.user;
    +
    +import org.apache.accumulo.core.data.Key;
    +
    +/**
    + * This exception can be thrown at from a next or seek call on an iterator to allow other scans to get time slots. This mechanism is intended to avoid a set of
    + * scans to dominate all of the scan slots (readahead threads) and starve other scans out.
    + */
    +public class ScanYieldException extends RuntimeException {
    +  private final Key position;
    +
    +  public ScanYieldException(Key position) {
    --- End diff --
    
    You make a good point of making SKVI more "user-friendly".  I am considering writing an alternative implementation where I will create a yielding SKVI interface which adds a method to specify whether the the last next or seek call yielded control before finding the next top key, and a method to return the key where we decided to yield control.  This way the initial SKVI api would not change and we would not be using exceptions for flow-control.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119144555
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -715,47 +722,54 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
     
         long maxResultsSize = tableConfiguration.getAsBytes(Property.TABLE_SCAN_MAXMEM);
     
    -    if (columns.size() == 0) {
    -      iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    -    } else {
    -      iter.seek(range, LocalityGroupUtil.families(columns), true);
    -    }
    -
         Key continueKey = null;
         boolean skipContinueKey = false;
     
         boolean endOfTabletReached = false;
    -    while (iter.hasTop()) {
     
    -      value = iter.getTopValue();
    -      key = iter.getTopKey();
    +    try {
    +      if (columns.size() == 0) {
    +        iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    --- End diff --
    
    > The underlying iterator would presumably seek its source with the same range. Now the seek is iterating through keys past the initial start of the range and realizes at some point that it has been passing by many of the underlying keys for whatever reason without returning control to the Tablet. Now it throws a ScanYieldException specifying where it left off in the seek.
    
    Yup, right.
    
    > Later the Tablet will come back and rebuild the iterator and seek it using the new start as specified. The seek can now continue doing its job until it actually finds a key that is to be returned. Where in this scenario is a key missed?
    
    The caveat here is that _no_ keys from this Range were returned; the SKVI would return a "position" that is the same as the startKey of the Range is was passed.
    
    ```java
    +    } catch (ScanYieldException sye) {
    +      continueKey = new Key(sye.getPosition());
    +      skipContinueKey = true;
    +    }
    ```
    
    I may be grasping at straws, but, the little bit I've read about how this method is invoked, the fact that `skipContinueKey` is set to true means that the next time we return to the Range, we construct a new Range which is non-inclusive of the original start key.
    
    Concretely:
    
    `seek([a, z])` throwing `ScanYieldException` would net a call `seek((a, z])` (instead of `seek([a,z])` again). This worried me because your test case obviously didn't cover this path. If you are sure this is not an issue, I trust you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    I am pretty satisfied this is now working as designed.  The test case uses an iterator that will throw a yield exception every other next and every other seek.  The test case also detects when the iterator has been torn down and rebuilt.  From the client perspective, nothing changes (all keys returned appropriately).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r118574263
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/user/ScanYieldException.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators.user;
    +
    +import org.apache.accumulo.core.data.Key;
    +
    +/**
    + * This exception can be thrown at from a next or seek call on an iterator to allow other scans to get time slots. This mechanism is intended to avoid a set of
    --- End diff --
    
    Agreed, will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119154379
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -715,47 +722,54 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
     
         long maxResultsSize = tableConfiguration.getAsBytes(Property.TABLE_SCAN_MAXMEM);
     
    -    if (columns.size() == 0) {
    -      iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    -    } else {
    -      iter.seek(range, LocalityGroupUtil.families(columns), true);
    -    }
    -
         Key continueKey = null;
         boolean skipContinueKey = false;
     
         boolean endOfTabletReached = false;
    -    while (iter.hasTop()) {
     
    -      value = iter.getTopValue();
    -      key = iter.getTopKey();
    +    try {
    +      if (columns.size() == 0) {
    +        iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    --- End diff --
    
    > The key placed in the ScanYieldException is not the start of the initial range. It is the key where it left off in the underlying scan
    
    You can't guarantee this. What I am saying is: what _if_ the key you see is the same?
    
    Your last comment makes me think that you intend for this exception to never be thrown when "no progress" was made. If that's correct, documentation needs to adequately warn..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    I believe the Jenkins server is borked somehow.  It has failed to build for a while and the reason appears completely independent of this pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @joshelser Given my response to @phrocker, are you satisfied?  I do output a debug message whenever an iterator yields in the Tablet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119894907
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    +  private K key;
    +
    +  /**
    +   * Called by the iterator when a next or seek call yields control.
    +   *
    +   * @param key
    +   *          the key position at which the iterator yielded.
    +   */
    +  public void yield(K key) {
    +    this.key = key;
    +  }
    +
    +  /**
    +   * Called by the client to see if the iterator yielded
    +   *
    +   * @return true if iterator yielded control
    +   */
    +  public boolean hasYielded() {
    +    return (this.key != null);
    +  }
    +
    +  /**
    +   * Called by the client to get the yield position used as the start key (non-inclusive) of the range in a subsequent seek call when the iterator is rebuilt.
    +   * This will also reset the state returned by hasYielded.
    +   *
    +   * @return <tt>K</tt> The key position
    +   */
    +  public K getPositionAndReset() {
    --- End diff --
    
    There needs to be some form of synchronization here or advertisement in the javadoc that the caller must not concurrent invoke these methods.
    
    In normal TServer operations, would they ever be invoked across threads?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by dlmarion <gi...@git.apache.org>.
Github user dlmarion commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119457222
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/SortedKeyValueIterator.java ---
    @@ -147,4 +147,35 @@
        *              if not supported.
        */
       SortedKeyValueIterator<K,V> deepCopy(IteratorEnvironment env);
    +
    +  /**
    +   * Called when the client (i.e. calling iterator or the tserver) will support this iterators ability to yield. If called (and true is returned), then when
    +   * next or seek is called this iterator may yield before finding the next top value and key. When this happens, the yield position (returned by
    +   * getYieldPosition) will be used when this iterator is rebuilt to seek to the position at which it yielded. This iterator can only yield if the user of this
    +   * iterator has called this method. Note that setYieldSupported will never be called in row isolation mode.
    +   *
    +   * @return <tt>boolean</tt> True if this iterator supported yielding, false otherwise.
    +   */
    +  default boolean setYieldSupported() {
    --- End diff --
    
    suggest having a getter and setter instead of using setYieldSupported as both a setter and getter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    > Is there an example of creating a new metric, or is it still all JMX based?
    
    Current "state of the art" is via Hadoop Metrics2. The metrics are exposed via JMX, but it doesn't require you do write to JMX directly :). A rough outline would be...
    
    1. Add a metrics key to https://github.com/apache/accumulo/blob/f81a8ec7410e789d11941351d5899b8894c6a322/server/tserver/src/main/java/org/apache/accumulo/tserver/metrics/TabletServerScanMetricsKeys.java
    2. Update `add(String, long)` in https://github.com/apache/accumulo/blob/f81a8ec7410e789d11941351d5899b8894c6a322/server/tserver/src/main/java/org/apache/accumulo/tserver/metrics/Metrics2TabletServerScanMetrics.java (yuck, this is gross)
    3. Update the RPC server implementation/code-path to invoke the metrics methods a la https://github.com/apache/accumulo/blob/b78edd3329cb7c981329fa2038a9352f79ab466d/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L629-L632


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120448398
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -586,6 +593,19 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
               mmfi.next();
             }
     
    +        if (yield.hasYielded()) {
    +          yielded = true;
    +          Key yieldPosition = yield.getPositionAndReset();
    --- End diff --
    
    ok



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r118567242
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -548,11 +549,13 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
           batchTimeOut = 0;
         }
     
    +    boolean yield = false;
    +
         for (Range range : ranges) {
     
           boolean timesUp = batchTimeOut > 0 && System.nanoTime() > returnTime;
     
    -      if (exceededMemoryUsage || tabletClosed || timesUp) {
    +      if (exceededMemoryUsage || tabletClosed || timesUp || yield) {
             lookupResult.unfinishedRanges.add(range);
             continue;
    --- End diff --
    
    The way the Exception's javadoc reads, it would preempty the entire Scan RPC, but it's actually just pre-empting one Range. Is that intentional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119143269
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/user/ScanYieldException.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators.user;
    +
    +import org.apache.accumulo.core.data.Key;
    +
    +/**
    + * This exception can be thrown at from a next or seek call on an iterator to allow other scans to get time slots. This mechanism is intended to avoid a set of
    + * scans to dominate all of the scan slots (readahead threads) and starve other scans out.
    + */
    +public class ScanYieldException extends RuntimeException {
    +  private final Key position;
    +
    +  public ScanYieldException(Key position) {
    --- End diff --
    
    > It appears that I have not adequately described the process here. Here is a scenario that might help understanding:
    
    No, I think I understand the control flow. I'm more worried about how we pass along this "tribal knowledge" to developers who stumble onto this Exception/code-path. As we try to make SKVI more "user-friendly", I'm trying to make sure that new things which are added stand up to the level of "ready for users" and don't increase our complexity-debt :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119107638
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/SortedKeyValueIterator.java ---
    @@ -69,6 +70,10 @@
        *              if called before seek.
        * @exception NoSuchElementException
        *              if next element doesn't exist.
    +   * @exception ScanYieldException
    +   *              Thrown if the iterator is permitting the scan to be torn down to allow other scans to use this thread. An iterator may throw this if it had
    --- End diff --
    
    s/if it had/if it has/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @phrocker 
    > Yields are not aware of other yields and thus are completely independent and thus cannot cooperate with any scheduling mechanism. My old Operating System book calls this "uncooperative yielding." But I can see how this can be confusing. Let's call it isolated yielding.
    
    Taking a quote from: https://en.wikipedia.org/wiki/Cooperative_multitasking:
    
    > Cooperative multitasking, also known as non-preemptive multitasking, is a style of computer multitasking in which the operating system never initiates a context switch from a running process to another process. Instead, processes voluntarily yield control periodically or when idle in order to enable multiple applications to be run simultaneously.
    
    I believe this is (at least in part) the capability I am providing here, the ability for the scan (a process in some sense) to voluntarily yield control.  This is why I used the word cooperative.  Granted the scheduler in this case does not require all iterators to be able to yield so in that sense we have a mix of cooperative and preemptive/uncooperative multitasking.  I submit in that I am stretching the definition, so "isolated yielding" it is.
    
    > To your point that "they do it to themselves." Well, since an iterator is one amongst a stack and you could have a multi-user system, if you had one iterator that would skip just five more keys before completing, but is pre-empted due to another iterator, you have the potential for a yield when one is not desired.
    
    One iterator cannot pre-empt another unless the first iterator explicitly enables it to do so (see enableYielding(callback) on SKVI).  By enabling yielding on an iterator/source, then that iterator must be able to deal with that iterator yielding after any seek or next call.  That is part of the contract.  Note that the only way to actually yield the scan is for the top level iterator to do so.  No iterator is required to call enableYielding on the iterator below it.  The Tablet will only call enableYielding on the top level iterator.  Since the top level iterator may be the SourceSwitchingIterator, I made sure that it can handle this and passes this on to the next iterator below.
    
    > The only way to combat this would be solid metrics. You don't know how many increased RPC calls there are. This can increase RPCs if you simply set the key yield incorrectly. You don't know I/O load and how many keys being skipped is reasonable without these metrics. Further, one key is not the same as another key. Parts of a table could have much smaller keys, so again, these metrics prove everything by telling us: how much time spent before yield, size of keys skipped, etc, etc
    > Hence those metrics would be useful to show if this mechanism works as intended in production.
    
    OK, I will see if I can add a metric to the accumulo metrics mechanism.
    
    > Then, after metrics, a nice to have would be a mechanism that allows the entire scan to stop. If you are going to put a limit and "yield." You must have a cessation point. Agree that long running scans can happen, but the RPC increase and context switching is a problem that we cannot stop with the current solution. You also need a point at which you have yielded enough and thus must stop entirely.
    
    Sounds like a reasonable feature.  Please write a ticket.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @joshelser OK, I will see if I can add a metric to the accumulo metrics mechanism.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119992351
  
    --- Diff: iterator-test-harness/src/main/java/org/apache/accumulo/iteratortest/testcases/YieldingTestCase.java ---
    @@ -0,0 +1,77 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to you under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.iteratortest.testcases;
    +
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Range;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
    +import org.apache.accumulo.core.iterators.YieldCallback;
    +import org.apache.accumulo.iteratortest.IteratorTestInput;
    +import org.apache.accumulo.iteratortest.IteratorTestOutput;
    +import org.apache.accumulo.iteratortest.IteratorTestUtil;
    +import org.apache.accumulo.iteratortest.environments.SimpleIteratorEnvironment;
    +
    +import java.io.IOException;
    +import java.util.TreeMap;
    +
    +/**
    + * Test case that verifies that an iterator works correctly with the yielding api. Note that most iterators do nothing in terms of yielding in which case this
    + * merely tests that the iterator produces the correct output. If however the iterator does override the yielding api, then this ensures that it works correctly
    + * iff the iterator actually decides to yield. Nothing can force an iterator to yield without knowing something about the internals of the iterator being
    + * tested.
    + */
    +public class YieldingTestCase extends OutputVerifyingTestCase {
    --- End diff --
    
    I thought that this structure was a reasonable mechanism to ensure that the basic expectations of an iterator are met.  In this case, I am not positive it adds much because this test case by itself cannot force the implemented iterator to yield.  However if the implementer could provide a mechanism to force his/her iterator to yield, then this might prove useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120447578
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    --- End diff --
    
    Alternatively the transform could be encoded in the instance of the YieldCallback passed in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @keith-turner re: Some comments from working on keith-turner/yield-test
    
    >    YieldCallback java docs on setting yield key could mention exclusiveness
    
    What does this mean?
    
    >    When I had many long running scans AND made my yielding filter yield after 1,000 keys THEN I saw 80% CPU utilitzation. This made me think things were not being immediately rescheduled.
    
    Are you suggesting another jira ticket here?  I have not changed any of the scan scheduling mechanism.
    
    >    It would be nice to add @since 2.0.0 to java docs in new SKVI method.... and maybe for YieldCallback class.
    
    ok, shall do



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119992411
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -746,16 +774,29 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
           iter.next();
         }
     
    -    if (iter.hasTop() == false) {
    -      endOfTabletReached = true;
    -    }
    +    if (yield.hasYielded()) {
    +      continueKey = new Key(yield.getPositionAndReset());
    +      skipContinueKey = true;
    +      if (!range.contains(continueKey)) {
    +        throw new IOException("Underlying iterator yielded to a position outside of its range: " + continueKey + " not in " + range);
    +      }
     
    -    if (endOfTabletReached) {
    -      continueKey = null;
    -    }
    +      log.debug("Scan yield detected at position " + continueKey);
    +      Metrics scanMetrics = getTabletServer().getScanMetrics();
    +      if (scanMetrics.isEnabled())
    +        scanMetrics.add(TabletServerScanMetrics.YIELD, 1);
    +    } else {
    +      if (iter.hasTop() == false) {
    +        endOfTabletReached = true;
    +      }
     
    -    if (endOfTabletReached && results.size() == 0)
    -      results = null;
    +      if (endOfTabletReached) {
    +        continueKey = null;
    +      }
    +
    +      if (endOfTabletReached && results.size() == 0)
    --- End diff --
    
    fair enough...shall do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    > The root cause as far as I can tell is:
    Caused by: java.lang.NoSuchFieldError: DEFAULT_USER_SETTINGS_FILE
    
    See: https://lists.apache.org/thread.html/cb39a537a23af83b56522160e8c02938c34a76553b03e8a3aba3b662@%3Cbuilds.apache.org%3E
    
    Should be resolved with INFRA by as soon as there is a Jenkins release that has the fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119618567
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -586,6 +591,11 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
               mmfi.next();
             }
     
    +        if (yield.hasYielded()) {
    +          Key yieldPosition = yield.getPositionAndReset();
    +          log.debug("Scan yield exception detected at position " + yieldPosition);
    --- End diff --
    
    good catch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119120650
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/user/ScanYieldException.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators.user;
    +
    +import org.apache.accumulo.core.data.Key;
    +
    +/**
    + * This exception can be thrown at from a next or seek call on an iterator to allow other scans to get time slots. This mechanism is intended to avoid a set of
    + * scans to dominate all of the scan slots (readahead threads) and starve other scans out.
    + */
    +public class ScanYieldException extends RuntimeException {
    +  private final Key position;
    +
    +  public ScanYieldException(Key position) {
    --- End diff --
    
    Can we make this more specific as to what `Key` should be passed in? If I understand correctly:
    
    1) For seek(), the start key of the Range passed to the Iterator
    2) For next(), the last key returned by the Iterator
    
    Maybe there is a third case for when seek() also consumes data before returning the call (finding the first key in the range which matches some condition)? I'm not sure of a concise way to state that :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119157139
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/functional/YieldingIterator.java ---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test.functional;
    +
    +import java.util.concurrent.atomic.AtomicBoolean;
    +import org.apache.accumulo.core.data.ByteSequence;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Range;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.iterators.IteratorEnvironment;
    +import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
    +import org.apache.accumulo.core.iterators.WrappingIterator;
    +import org.apache.accumulo.core.iterators.user.ScanYieldException;
    +
    +import java.io.IOException;
    +import java.util.Collection;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/**
    + * This iterator will throw a ScanYieldException every other key
    + */
    --- End diff --
    
    A more detailed description would be very helpful.  You could basically sum up all your comments and capture them here so future generations can benefit as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by dlmarion <gi...@git.apache.org>.
Github user dlmarion commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    LGTM, might need more tests for the edge cases. May also want to look at the iterator test harness to see if it needs to be modified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120023595
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    +  private K key;
    +
    +  /**
    +   * Called by the iterator when a next or seek call yields control.
    +   *
    +   * @param key
    +   *          the key position at which the iterator yielded.
    +   */
    +  public void yield(K key) {
    +    this.key = key;
    +  }
    +
    +  /**
    +   * Called by the client to see if the iterator yielded
    +   *
    +   * @return true if iterator yielded control
    +   */
    +  public boolean hasYielded() {
    +    return (this.key != null);
    --- End diff --
    
    Dry humor often falls flat online, especially when dealing with criticism. Thanks for clarifying. No need to buy any beers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119895624
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    +  private K key;
    +
    +  /**
    +   * Called by the iterator when a next or seek call yields control.
    +   *
    +   * @param key
    +   *          the key position at which the iterator yielded.
    +   */
    +  public void yield(K key) {
    +    this.key = key;
    +  }
    +
    +  /**
    +   * Called by the client to see if the iterator yielded
    +   *
    +   * @return true if iterator yielded control
    +   */
    +  public boolean hasYielded() {
    +    return (this.key != null);
    --- End diff --
    
    nit: unnecessary parens


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @joshelser We are in the process of getting this put onto a larger cluster to test its efficacy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    Some comments from working on [keith-turner/yield-test](https://github.com/keith-turner/yield-test)
    
     * YieldCallback java docs on setting yield key could mention exclusiveness
     * When I had many long running scans AND made my yielding filter yield after 1,000 keys THEN I saw 80% CPU utilitzation.  This made me think things were not being immediately rescheduled.
      * It would be nice to add `@since 2.0.0` to java docs in new SKVI method.... and maybe for YieldCallback class.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119134520
  
    --- Diff: test/src/main/java/org/apache/accumulo/test/YieldScannersIT.java ---
    @@ -0,0 +1,106 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.test;
    +
    +import org.apache.accumulo.core.client.BatchWriter;
    +import org.apache.accumulo.core.client.BatchWriterConfig;
    +import org.apache.accumulo.core.client.Connector;
    +import org.apache.accumulo.core.client.IteratorSetting;
    +import org.apache.accumulo.core.client.Scanner;
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Mutation;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.security.Authorizations;
    +import org.apache.accumulo.harness.AccumuloClusterHarness;
    +import org.apache.accumulo.minicluster.impl.MiniAccumuloConfigImpl;
    +import org.apache.accumulo.test.functional.YieldingIterator;
    +import org.apache.commons.lang.StringUtils;
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.io.Text;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import java.util.Iterator;
    +import java.util.Map;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +// ACCUMULO-4643
    +public class YieldScannersIT extends AccumuloClusterHarness {
    +  Logger log = LoggerFactory.getLogger(YieldScannersIT.class);
    +
    +  @Override
    +  public int defaultTimeoutSeconds() {
    +    return 60;
    +  }
    +
    +  @Override
    +  public void configureMiniCluster(MiniAccumuloConfigImpl cfg, Configuration hadoopCoreSite) {
    +    cfg.setNumTservers(1);
    +  }
    +
    +  @Test
    +  public void test() throws Exception {
    +    // make a table
    +    final String tableName = getUniqueNames(1)[0];
    +    final Connector conn = getConnector();
    +    conn.tableOperations().create(tableName);
    +    final BatchWriter writer = conn.createBatchWriter(tableName, new BatchWriterConfig());
    +    for (int i = 0; i < 10; i++) {
    +      byte[] row = new byte[] {(byte) ('a' + i)};
    +      Mutation m = new Mutation(new Text(row));
    +      m.put(new Text(), new Text(), new Value());
    +      writer.addMutation(m);
    +    }
    +    writer.flush();
    +    writer.close();
    +
    +    log.info("Creating scanner");
    +    // make a scanner for a table with 10 keys
    +    final Scanner scanner = conn.createScanner(tableName, Authorizations.EMPTY);
    +    final IteratorSetting cfg = new IteratorSetting(100, YieldingIterator.class);
    +    scanner.addScanIterator(cfg);
    +
    +    log.info("iterating");
    +    Iterator<Map.Entry<Key,Value>> it = scanner.iterator();
    +    int keyCount = 0;
    +    int yieldNextCount = 0;
    +    int yieldSeekCount = 0;
    +    while (it.hasNext()) {
    +      Map.Entry<Key,Value> next = it.next();
    +      log.info(Integer.toString(keyCount) + ": Got key " + next.getKey() + " with value " + next.getValue());
    +
    +      // verify we got the expected key
    +      char expected = (char) ('a' + keyCount);
    --- End diff --
    
    Not too sure how using a code point would be any clearer than specifying 'a' as the code point.....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    I reworked the pull request to use interface methods instead of throwing exceptions.  This also ensures that the yielding mechanism will not interrupt when using row isolation mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    One final (hopefully ;-)) thing.. can you add a metric to capture how often this exception is thrown/caught?
    
    If this is used by developer-written iterators on shared hardware, it could drastically change the RPC patterns on the system.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    I recommend improving the log message before this gets merged. "initial implementation" doesn't quite explain what it does. The message has a reference to a JIRA, but a brief description within the git history is still nice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119119845
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -715,47 +722,54 @@ Batch nextBatch(SortedKeyValueIterator<Key,Value> iter, Range range, int num, Se
     
         long maxResultsSize = tableConfiguration.getAsBytes(Property.TABLE_SCAN_MAXMEM);
     
    -    if (columns.size() == 0) {
    -      iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    -    } else {
    -      iter.seek(range, LocalityGroupUtil.families(columns), true);
    -    }
    -
         Key continueKey = null;
         boolean skipContinueKey = false;
     
         boolean endOfTabletReached = false;
    -    while (iter.hasTop()) {
     
    -      value = iter.getTopValue();
    -      key = iter.getTopKey();
    +    try {
    +      if (columns.size() == 0) {
    +        iter.seek(range, LocalityGroupUtil.EMPTY_CF_SET, false);
    --- End diff --
    
    If `seek` throws the `ScanYieldException` and there exists a Key in the data which is the same as the startKey of the Range, I believe we would miss data.
    
    `skipContinueKey` is set to true (in the catch below), but when we're `seek()`'ing here, we haven't yet consumed that top key. Thus, on a re-seek, we would skip past it.
    
    My hunch is that you need a two catch statements: one here that does not skip the continueKey and the one below around the `while(hasTop)` (as-is).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @keith-turner You can thank @dlmarion for the brilliant callback idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119991581
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/SortedKeyValueIterator.java ---
    @@ -147,4 +147,16 @@
        *              if not supported.
        */
       SortedKeyValueIterator<K,V> deepCopy(IteratorEnvironment env);
    +
    +  /**
    +   * Called when the client (i.e. calling iterator or the tserver) will support this iterators ability to yield. If called, then when next or seek is called
    --- End diff --
    
    ok, whatever will make it more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by dlmarion <gi...@git.apache.org>.
Github user dlmarion commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120410868
  
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java ---
    @@ -586,6 +593,19 @@ private LookupResult lookup(SortedKeyValueIterator<Key,Value> mmfi, List<Range>
               mmfi.next();
             }
     
    +        if (yield.hasYielded()) {
    +          yielded = true;
    +          Key yieldPosition = yield.getPositionAndReset();
    --- End diff --
    
    Should we check that yieldPosition is following the last KVEntry in the results object?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r119896647
  
    --- Diff: iterator-test-harness/src/main/java/org/apache/accumulo/iteratortest/testcases/YieldingTestCase.java ---
    @@ -0,0 +1,77 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to you under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.iteratortest.testcases;
    +
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Range;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
    +import org.apache.accumulo.core.iterators.YieldCallback;
    +import org.apache.accumulo.iteratortest.IteratorTestInput;
    +import org.apache.accumulo.iteratortest.IteratorTestOutput;
    +import org.apache.accumulo.iteratortest.IteratorTestUtil;
    +import org.apache.accumulo.iteratortest.environments.SimpleIteratorEnvironment;
    +
    +import java.io.IOException;
    +import java.util.TreeMap;
    +
    +/**
    + * Test case that verifies that an iterator works correctly with the yielding api. Note that most iterators do nothing in terms of yielding in which case this
    + * merely tests that the iterator produces the correct output. If however the iterator does override the yielding api, then this ensures that it works correctly
    + * iff the iterator actually decides to yield. Nothing can force an iterator to yield without knowing something about the internals of the iterator being
    + * tested.
    + */
    +public class YieldingTestCase extends OutputVerifyingTestCase {
    --- End diff --
    
    Thanks for writing this! Outside of the context of the review, I'm curious to hear your thoughts on the usefulness of this structure. Aside from Dylan, i think you're the only other person "brave" enough to dig in :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    > I do output a debug message whenever an iterator yields in the Tablet.
    
    Not really, no. Logging is not metrics.
    
    If it's not clear, I would consider myself a -0 with this lacking. We should be striving to expose good metrics (through a standardized interface) when we're making server-side changes like these. However, we don't have rigor in Accumulo to do this, so I'm not going to hold an axe over your head.
    
    I also haven't looked at your lastest (few?) patches which contributes to me not being supportive (not your fault, obviously).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #260: ACCUMULO-4643 initial implementation

Posted by ivakegg <gi...@git.apache.org>.
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    > Awesome :D. Should I take that as avoiding merging this until that testing is complete?
    Yes, hold off on merging.  @keith-turner requested that we pound this a little first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #260: ACCUMULO-4643 initial implementation

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/260#discussion_r120022426
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/YieldCallback.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.iterators;
    +
    +/**
    + * This callback handles the state of yielding within an iterator
    + */
    +public class YieldCallback<K> {
    +  private K key;
    +
    +  /**
    +   * Called by the iterator when a next or seek call yields control.
    +   *
    +   * @param key
    +   *          the key position at which the iterator yielded.
    +   */
    +  public void yield(K key) {
    +    this.key = key;
    +  }
    +
    +  /**
    +   * Called by the client to see if the iterator yielded
    +   *
    +   * @return true if iterator yielded control
    +   */
    +  public boolean hasYielded() {
    +    return (this.key != null);
    +  }
    +
    +  /**
    +   * Called by the client to get the yield position used as the start key (non-inclusive) of the range in a subsequent seek call when the iterator is rebuilt.
    +   * This will also reset the state returned by hasYielded.
    +   *
    +   * @return <tt>K</tt> The key position
    +   */
    +  public K getPositionAndReset() {
    --- End diff --
    
    Great. That's why I asked.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---