You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by joshelser <gi...@git.apache.org> on 2017/04/11 00:04:08 UTC

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

GitHub user joshelser opened a pull request:

    https://github.com/apache/accumulo/pull/247

    ACCUMULO-3208 Integration test for the OrIterator and cleanup

    The OrIterator was in very bad shape, with next to no documentation
    about what it actually does.
    
    Put this against 1.8 for fun. Can rebase wherever.
    
    FYI @milleruntime @keith-turner @phrocker 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/joshelser/accumulo 3208-OrIterator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #247
    
----
commit d42e4207aeba60eb4d6c0fac0a48113d1a0ea1a7
Author: Josh Elser <el...@apache.org>
Date:   2017-04-10T02:45:56Z

    ACCUMULO-3208 Integration test for the OrIterator and cleanup
    
    The OrIterator was in very bad shape, with next to no documentation
    about what it actually does.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/247
  
    \U0001f44d  Thanks Josh, this is great!  I didn't want to have to follow through with my threatening PR :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110996811
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +113,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
    --- End diff --
    
    > this implementation is optimized and returns an unsorted list.
    
    It's not an optimization. It's the implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/247
  
    > Hah! Did you happen to look at this with an eye for clarity/simplification? One of my goals was to simplify the code paths (without breaking "optimizations") while also leaving better comments explaining what each path was doing.
    
    I am looking deeper.  I just cloned your repo and I'm looking at the code through my IDE... the true clarity test!  I also want to step through the IT you created.  I wanted to give my initial response to the PR above since it was very positive.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

https://github.com/apache/accumulo/pull/247#discussion_r111025624

--- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
@@ -39,29 +39,44 @@
import org.slf4j.LoggerFactory;

/**
- * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
+ * An iterator that provides a sorted-iteration of column qualifiers for a set of column families in a row.
+ * It is important to note that this iterator <em>does not</em> adhere to the contract set forth by the
+ * {@link SortedKeyValueIterator}. It returns Keys in {@code row+colqual} order instead of
+ * {@code row+colfam+colqual} order. This is required for the implementation of this iterator (to work
+ * in conjunction with the {@code IntersectingIterator}) but is a code-smell. This iterator should only
--- End diff --

> I don't think you have to insult the hygiene of our code unless you know of a use case where this design could cause problems

Is this tongue-in-cheek? I'm not sure if you think this comment is actually offensive to other developers.

It really is *bad*, IMO. "Apache Accumulo� is a sorted, distributed key/value store that..." and we're providing an Iterator that, by all definitions, should be returning sorted data but is not. The intended use-case of this iterator is sitting behind the scenes (not feeding data directly to a Scanner/BatchScanner), which is why it's OK. I think it's important to let users know who venture here what they're getting into because it's unlike any (all?) other SKVI's that we provide.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110991222
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 steve:3
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    --- End diff --
    
    I assume this was done as an optimization.  Since it fully implements SKVI (with that one exception), you could rename the class to something like "OrIteratorUnsorted" or "OptimizedOrIterator" making the exception more obvious.  Then someone could extend it or create another (albeit less optimized) iterator that returns sorted data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111157714
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -39,29 +39,44 @@
     import org.slf4j.LoggerFactory;
     
     /**
    - * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + * An iterator that provides a sorted-iteration of column qualifiers for a set of column families in a row.
    + * It is important to note that this iterator <em>does not</em> adhere to the contract set forth by the
    + * {@link SortedKeyValueIterator}. It returns Keys in {@code row+colqual} order instead of
    + * {@code row+colfam+colqual} order. This is required for the implementation of this iterator (to work
    + * in conjunction with the {@code IntersectingIterator}) but is a code-smell. This iterator should only
    --- End diff --
    
    Yes it was tongue-in-cheek. I had to google "code smell" since I hadn't heard that phrase before but I don't think its offensive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110951131
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
    --- End diff --
    
    Should state this iterator requires setting of the COLUMNS_KEY option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110951592
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
    --- End diff --
    
    Good point. The way is was written before required you to write your own iterator to use it (adding `TermSource`'s on your own). I'll expand the class-level javadoc to better describe these two usages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111001594
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    +  private static final Logger LOG = LoggerFactory.getLogger(OrIterator.class);
    +  public static final String COLUMNS_KEY = "or.iterator.columns";
     
       private TermSource currentTerm;
    -  private ArrayList<TermSource> sources;
    +  private List<TermSource> sources;
       private PriorityQueue<TermSource> sorted = new PriorityQueue<>(5);
    --- End diff --
    
    Look at the class level javadoc again. It's sorting by row and then colqual, not row then colfam. e.g. "row1 bob:4" should come before "row1 steve:3" but it will not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/247
  
    Sounds good.  It should be an easy merge with 1.7 as well since it hasn't changed so why not?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111005815
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    +  private static final Logger LOG = LoggerFactory.getLogger(OrIterator.class);
    +  public static final String COLUMNS_KEY = "or.iterator.columns";
     
       private TermSource currentTerm;
    -  private ArrayList<TermSource> sources;
    +  private List<TermSource> sources;
       private PriorityQueue<TermSource> sorted = new PriorityQueue<>(5);
    --- End diff --
    
    Ahhhhhh OK I see now.  Thanks for the clarification.  Sorry when you said it returned records out of order, I assumed you meant they were unsorted.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

https://github.com/apache/accumulo/pull/247#discussion_r111024504

--- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
@@ -39,29 +39,44 @@
import org.slf4j.LoggerFactory;

Very clear and concise but not pedantic.

> but is a code-smell

I don't think you have to insult the hygiene of our code unless you know of a use case where this design could cause problems. Otherwise I would just say this iterator sorts differently by design.

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/247
  
    > I also want to step through the IT you created.
    
    So the problem with this is that the TabletServer, and thus the OrIterator, are running in a separate JVM. It's not easy to just debug this IT like it is with any other unit test.
    
    Creating a MiniAccumuloCluster that can run the Accumulo services in the same JVM would be extremely beneficial (https://issues.apache.org/jira/browse/ACCUMULO-2739). I will say, debugging "minicluster" tests in HBase is monumentally easier than it is in Accumulo. I just don't have the time to revisit this one, sadly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110995255
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    +  private static final Logger LOG = LoggerFactory.getLogger(OrIterator.class);
    +  public static final String COLUMNS_KEY = "or.iterator.columns";
     
       private TermSource currentTerm;
    -  private ArrayList<TermSource> sources;
    +  private List<TermSource> sources;
       private PriorityQueue<TermSource> sorted = new PriorityQueue<>(5);
    --- End diff --
    
    Since this queue is not actually sorted, this name is very misleading.  I would rename to something like "validSources".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111170295
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +128,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
         }
    +
    +    /**
    +     * Converts the given {@code Range} into the correct {@code Range} for this TermSource (per this expected table structure) and then seeks this TermSource's
    +     * SKVI.
    +     */
    +    public void seek(Range originalRange) throws IOException {
    +      // the infinite start key is equivalent to a null startKey on the Range.
    +      if (!originalRange.isInfiniteStartKey()) {
    +        Key originalStartKey = originalRange.getStartKey();
    +        // Pivot the provided range into the range for this term
    +        Key newKey = new Key(originalStartKey.getRow(), term, originalStartKey.getColumnQualifier(), originalStartKey.getTimestamp());
    +        // Construct the new range, preserving the other attributes on the provided range.
    +        currentRange = new Range(newKey, originalRange.isStartKeyInclusive(), originalRange.getEndKey(), originalRange.isEndKeyInclusive());
    +      } else {
    +        currentRange = originalRange;
    +      }
    +      LOG.trace("Seeking {} to {}", this, currentRange);
    +      iter.seek(currentRange, seekColfams, true);
    +    }
    +
    +    @Override
    +    public String toString() {
    +      StringBuilder sb = new StringBuilder();
    +      sb.append("TermSource{term=").append(term).append(", currentRange=").append(currentRange).append("}");
    +      return sb.toString();
    +    }
    +
    +    /**
    +     * @return True if there is a valid topKey which falls into the range this TermSource's iterator was last seeked to, false otherwise.
    +     */
    +    boolean hasEntryForTerm() {
    +      if (!iter.hasTop()) {
    +        return false;
    +      }
    +      return currentRange.contains(iter.getTopKey());
    +    }
       }
     
       public OrIterator() {
    -    this.sources = new ArrayList<>();
    +    this.sources = Collections.emptyList();
       }
     
       private OrIterator(OrIterator other, IteratorEnvironment env) {
    -    this.sources = new ArrayList<>();
    +    ArrayList<TermSource> copiedSources = new ArrayList<>();
     
         for (TermSource TS : other.sources)
    -      this.sources.add(new TermSource(TS.iter.deepCopy(env), TS.term));
    +      copiedSources.add(new TermSource(TS.iter.deepCopy(env), new Text(TS.term)));
    +    this.sources = Collections.unmodifiableList(copiedSources);
       }
     
       @Override
       public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
         return new OrIterator(this, env);
       }
     
    -  public void addTerm(SortedKeyValueIterator<Key,Value> source, Text term, IteratorEnvironment env) {
    -    this.sources.add(new TermSource(source.deepCopy(env), term));
    +  public void setTerms(SortedKeyValueIterator<Key,Value> source, Collection<String> terms, IteratorEnvironment env) {
    +    ArrayList<TermSource> newTerms = new ArrayList<>();
    +    for (String term : terms) {
    +      newTerms.add(new TermSource(source.deepCopy(env), new Text(term)));
    +    }
    +    this.sources = Collections.unmodifiableList(newTerms);
       }
     
       @Override
       final public void next() throws IOException {
    -
    +    LOG.trace("next()");
         if (currentTerm == null)
           return;
     
         // Advance currentTerm
         currentTerm.iter.next();
    --- End diff --
    
    I'm not aware of any exceptions bubbling up out of an iterator if we call `next()` at the end of the iteration. Whether or not `currentTerm` is `null` is essentially making the same check (we never preserve the `TermSource` as `currentTerm` or in the heap after it's been exhausted). Down below, we do that check proactively (do we have more entries for this TermSource) before preserving it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111169081
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +128,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
         }
    +
    +    /**
    +     * Converts the given {@code Range} into the correct {@code Range} for this TermSource (per this expected table structure) and then seeks this TermSource's
    +     * SKVI.
    +     */
    +    public void seek(Range originalRange) throws IOException {
    +      // the infinite start key is equivalent to a null startKey on the Range.
    +      if (!originalRange.isInfiniteStartKey()) {
    +        Key originalStartKey = originalRange.getStartKey();
    +        // Pivot the provided range into the range for this term
    +        Key newKey = new Key(originalStartKey.getRow(), term, originalStartKey.getColumnQualifier(), originalStartKey.getTimestamp());
    +        // Construct the new range, preserving the other attributes on the provided range.
    +        currentRange = new Range(newKey, originalRange.isStartKeyInclusive(), originalRange.getEndKey(), originalRange.isEndKeyInclusive());
    --- End diff --
    
    If we blindly set inclusivity on the start key, we will infinite loop in the small scan max memory condition and a tablet with multiple rows (you could try this if you want with the one integration test included).
    
    On the re-seek, we'll get a range that looks like `(row term:prev_matched_docid, +inf)` (or whatever the end key is). But, if a user provides a specific row/range, we might get `[row term:docid, row term:docid\x00)`. Preserving the startKeyInclusivity handles both cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/247
  
    > this is great! I didn't want to have to follow through with my threatening PR :)
    
    Hah!  Did you happen to look at this with an eye for clarity/simplification? One of my goals was to simplify the code paths (without breaking "optimizations") while also leaving better comments explaining what each path was doing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110965861
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +113,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
    --- End diff --
    
    Why ColumnQualifer and not ColumnFamily?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/247
  
    Seems like Marc and Mike are happy with this so I'd like to merge it today.
    
    Where do we want this? 1.7? 1.8? just 2.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by phrocker <gi...@git.apache.org>.

Github user phrocker commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111037441
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +128,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
         }
    +
    +    /**
    +     * Converts the given {@code Range} into the correct {@code Range} for this TermSource (per this expected table structure) and then seeks this TermSource's
    +     * SKVI.
    +     */
    +    public void seek(Range originalRange) throws IOException {
    +      // the infinite start key is equivalent to a null startKey on the Range.
    +      if (!originalRange.isInfiniteStartKey()) {
    +        Key originalStartKey = originalRange.getStartKey();
    +        // Pivot the provided range into the range for this term
    +        Key newKey = new Key(originalStartKey.getRow(), term, originalStartKey.getColumnQualifier(), originalStartKey.getTimestamp());
    +        // Construct the new range, preserving the other attributes on the provided range.
    +        currentRange = new Range(newKey, originalRange.isStartKeyInclusive(), originalRange.getEndKey(), originalRange.isEndKeyInclusive());
    +      } else {
    +        currentRange = originalRange;
    +      }
    +      LOG.trace("Seeking {} to {}", this, currentRange);
    +      iter.seek(currentRange, seekColfams, true);
    +    }
    +
    +    @Override
    +    public String toString() {
    +      StringBuilder sb = new StringBuilder();
    +      sb.append("TermSource{term=").append(term).append(", currentRange=").append(currentRange).append("}");
    +      return sb.toString();
    +    }
    +
    +    /**
    +     * @return True if there is a valid topKey which falls into the range this TermSource's iterator was last seeked to, false otherwise.
    +     */
    +    boolean hasEntryForTerm() {
    +      if (!iter.hasTop()) {
    +        return false;
    +      }
    +      return currentRange.contains(iter.getTopKey());
    +    }
       }
     
       public OrIterator() {
    -    this.sources = new ArrayList<>();
    +    this.sources = Collections.emptyList();
       }
     
       private OrIterator(OrIterator other, IteratorEnvironment env) {
    -    this.sources = new ArrayList<>();
    +    ArrayList<TermSource> copiedSources = new ArrayList<>();
     
         for (TermSource TS : other.sources)
    -      this.sources.add(new TermSource(TS.iter.deepCopy(env), TS.term));
    +      copiedSources.add(new TermSource(TS.iter.deepCopy(env), new Text(TS.term)));
    +    this.sources = Collections.unmodifiableList(copiedSources);
       }
     
       @Override
       public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
         return new OrIterator(this, env);
       }
     
    -  public void addTerm(SortedKeyValueIterator<Key,Value> source, Text term, IteratorEnvironment env) {
    -    this.sources.add(new TermSource(source.deepCopy(env), term));
    +  public void setTerms(SortedKeyValueIterator<Key,Value> source, Collection<String> terms, IteratorEnvironment env) {
    +    ArrayList<TermSource> newTerms = new ArrayList<>();
    +    for (String term : terms) {
    +      newTerms.add(new TermSource(source.deepCopy(env), new Text(term)));
    +    }
    +    this.sources = Collections.unmodifiableList(newTerms);
       }
     
       @Override
       final public void next() throws IOException {
    -
    +    LOG.trace("next()");
         if (currentTerm == null)
           return;
     
         // Advance currentTerm
         currentTerm.iter.next();
    --- End diff --
    
    Why was has next top called here to avoid an exception (IllegalStateException I believe ? )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110997152
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 steve:3
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    --- End diff --
    
    > I assume this was done as an optimization.
    
    No, it's the design of the iterator. It wouldn't function correctly if it returned in non-docId sorted order.
    
    >  Since it fully implements SKVI
    
    Yes, it implements all the methods of SKVI (I mean, it wouldn't compile otherwise). The issue is that it doesn't adhere to the runtime contract. While iterating over any SKVI, the Keys should always be increasing in sort-order. This iterator, by design, does not do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

Posted by ctubbsii <gi...@git.apache.org>.

Github user ctubbsii commented on the issue:

    https://github.com/apache/accumulo/pull/247
  
    +1 for 1.7+


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/247
  
    @milleruntime want to let me know if the clarifications to the class-level javadoc in e459c12 are sufficient? Tried to make them a bit more consumable without being too pedantic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/247
  
    Good enough for me. Will land on 1.7. Thanks folks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by phrocker <gi...@git.apache.org>.

Github user phrocker commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111037495
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +128,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
         }
    +
    +    /**
    +     * Converts the given {@code Range} into the correct {@code Range} for this TermSource (per this expected table structure) and then seeks this TermSource's
    +     * SKVI.
    +     */
    +    public void seek(Range originalRange) throws IOException {
    +      // the infinite start key is equivalent to a null startKey on the Range.
    +      if (!originalRange.isInfiniteStartKey()) {
    +        Key originalStartKey = originalRange.getStartKey();
    +        // Pivot the provided range into the range for this term
    +        Key newKey = new Key(originalStartKey.getRow(), term, originalStartKey.getColumnQualifier(), originalStartKey.getTimestamp());
    +        // Construct the new range, preserving the other attributes on the provided range.
    +        currentRange = new Range(newKey, originalRange.isStartKeyInclusive(), originalRange.getEndKey(), originalRange.isEndKeyInclusive());
    +      } else {
    +        currentRange = originalRange;
    +      }
    +      LOG.trace("Seeking {} to {}", this, currentRange);
    +      iter.seek(currentRange, seekColfams, true);
    +    }
    +
    +    @Override
    +    public String toString() {
    +      StringBuilder sb = new StringBuilder();
    +      sb.append("TermSource{term=").append(term).append(", currentRange=").append(currentRange).append("}");
    +      return sb.toString();
    +    }
    +
    +    /**
    +     * @return True if there is a valid topKey which falls into the range this TermSource's iterator was last seeked to, false otherwise.
    +     */
    +    boolean hasEntryForTerm() {
    +      if (!iter.hasTop()) {
    +        return false;
    +      }
    +      return currentRange.contains(iter.getTopKey());
    +    }
       }
     
       public OrIterator() {
    -    this.sources = new ArrayList<>();
    +    this.sources = Collections.emptyList();
       }
     
       private OrIterator(OrIterator other, IteratorEnvironment env) {
    -    this.sources = new ArrayList<>();
    +    ArrayList<TermSource> copiedSources = new ArrayList<>();
     
         for (TermSource TS : other.sources)
    -      this.sources.add(new TermSource(TS.iter.deepCopy(env), TS.term));
    +      copiedSources.add(new TermSource(TS.iter.deepCopy(env), new Text(TS.term)));
    +    this.sources = Collections.unmodifiableList(copiedSources);
       }
     
       @Override
       public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
         return new OrIterator(this, env);
       }
     
    -  public void addTerm(SortedKeyValueIterator<Key,Value> source, Text term, IteratorEnvironment env) {
    -    this.sources.add(new TermSource(source.deepCopy(env), term));
    +  public void setTerms(SortedKeyValueIterator<Key,Value> source, Collection<String> terms, IteratorEnvironment env) {
    +    ArrayList<TermSource> newTerms = new ArrayList<>();
    +    for (String term : terms) {
    +      newTerms.add(new TermSource(source.deepCopy(env), new Text(term)));
    +    }
    +    this.sources = Collections.unmodifiableList(newTerms);
       }
     
       @Override
       final public void next() throws IOException {
    -
    +    LOG.trace("next()");
         if (currentTerm == null)
           return;
     
         // Advance currentTerm
         currentTerm.iter.next();
    --- End diff --
    
    I could be wrong that one is thrown. I'm currently not in a position to check this, so ignore me if I'm wrong. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110950549
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    --- End diff --
    
    > This code has been adapted/merged from Heap and Multi Iterators.
    
    It is not clear to me what this means.  Does this add anything for someone with a deep understanding of Heap and Multi iterators?  If not, maybe just remove this sentence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110971254
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +113,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
    --- End diff --
    
    See the class-level javadoc "This Iterator will return a sorted iteration of docIDs for a given set of terms."
    
    It's the same issue that I pinged @phrocker about. This is actually violating the SKVI contract (returning records out of order), but the implementation just happens to allow it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111001139
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    +  private static final Logger LOG = LoggerFactory.getLogger(OrIterator.class);
    +  public static final String COLUMNS_KEY = "or.iterator.columns";
     
       private TermSource currentTerm;
    -  private ArrayList<TermSource> sources;
    +  private List<TermSource> sources;
       private PriorityQueue<TermSource> sorted = new PriorityQueue<>(5);
    --- End diff --
    
    Ah OK I missed that.  I don't see how this iterator is NOT sorting then...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by milleruntime <gi...@git.apache.org>.

Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110992809
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +113,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
    --- End diff --
    
    You could add a comment to TermSource stating that the CQ is the docID for the term and is used for sorting.  Then note in the class comments that the inner class TermSource, is sortable by docID but this implementation is optimized and returns an unsorted list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110790242
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 steve:3
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    --- End diff --
    
    I was talking to Marc about this one: I'm doubting everything because I never realized that this iterator doesn't actually return data in sorted-order (violating the contract of SKVI). Best as I can tell, that is just how it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110951958
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    --- End diff --
    
    Yeah, that's a good point since the big-picture was to make this "user-facing" which HeapIterator and MultiIterator are not (they're actually "system" iterators, lol). That comment likely predates the package separation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111168254
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -39,29 +39,44 @@
     import org.slf4j.LoggerFactory;
     
     /**
    - * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + * An iterator that provides a sorted-iteration of column qualifiers for a set of column families in a row.
    + * It is important to note that this iterator <em>does not</em> adhere to the contract set forth by the
    + * {@link SortedKeyValueIterator}. It returns Keys in {@code row+colqual} order instead of
    + * {@code row+colfam+colqual} order. This is required for the implementation of this iterator (to work
    + * in conjunction with the {@code IntersectingIterator}) but is a code-smell. This iterator should only
    --- End diff --
    
    Hahaha, whew. Glad to have shared a new piece of lingo with ya.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/accumulo/pull/247


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by joshelser <gi...@git.apache.org>.

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110996691
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    +  private static final Logger LOG = LoggerFactory.getLogger(OrIterator.class);
    +  public static final String COLUMNS_KEY = "or.iterator.columns";
     
       private TermSource currentTerm;
    -  private ArrayList<TermSource> sources;
    +  private List<TermSource> sources;
       private PriorityQueue<TermSource> sorted = new PriorityQueue<>(5);
    --- End diff --
    
    uhh, I don't follow what you mean. The priority queue is implicitly sorted by the `Comparator` of the class being stored (TermSource)...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

Posted by phrocker <gi...@git.apache.org>.

Github user phrocker commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111038914
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +128,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
         }
    +
    +    /**
    +     * Converts the given {@code Range} into the correct {@code Range} for this TermSource (per this expected table structure) and then seeks this TermSource's
    +     * SKVI.
    +     */
    +    public void seek(Range originalRange) throws IOException {
    +      // the infinite start key is equivalent to a null startKey on the Range.
    +      if (!originalRange.isInfiniteStartKey()) {
    +        Key originalStartKey = originalRange.getStartKey();
    +        // Pivot the provided range into the range for this term
    +        Key newKey = new Key(originalStartKey.getRow(), term, originalStartKey.getColumnQualifier(), originalStartKey.getTimestamp());
    +        // Construct the new range, preserving the other attributes on the provided range.
    +        currentRange = new Range(newKey, originalRange.isStartKeyInclusive(), originalRange.getEndKey(), originalRange.isEndKeyInclusive());
    --- End diff --
    
    Are you sure you want to trust the start key  inclusivity flag if you are setting term in the new key above?  Seems like if you're adjusting the range you want to make your key inclusive. Am I wrong in my logic?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---