You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by milleruntime <gi...@git.apache.org> on 2017/03/29 22:05:10 UTC

[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

GitHub user milleruntime opened a pull request:

    https://github.com/apache/accumulo/pull/237

    ACCUMULO-3521: minor improvements to iterators

    Updated iterators mentioned in [ACCUMULO-3521](https://issues.apache.org/jira/browse/ACCUMULO-3521), added tests to cover untested methods and deprecated OrIterator.  Couldn't find IteratorUtil.getMaxPriority and .findIterator methods.  StatsCombiner is in examples.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/milleruntime/accumulo ACCUMULO-3521

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #237
    
----
commit 96c1e4ff9ebbefc00c57c7c5c76aeac737ca314d
Author: Mike Miller <mm...@apache.org>
Date:   2017-03-29T20:45:56Z

    ACCUMULO-3521: deprecate OrIterator

commit 808b10278e20f9222c2c398a003bee4c39676bc5
Author: Mike Miller <mm...@apache.org>
Date:   2017-03-29T20:49:18Z

    ACCUMULO-3521: improvements to FirstEntryInRowIterator and Test

commit 4366d58e60b641f4ace075d7812fc947e2e18910
Author: Mike Miller <mm...@apache.org>
Date:   2017-03-29T20:49:55Z

    ACCUMULO-3521: improvements to TypedValueCombiner and added test to CombinerTest

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    > I agree it seems to break the SKVI contract
    
    +1
    
    I think that removal breaks the contract, but, for the wikisearch use-case with very large rows/shards, this bug wouldn't have been noticed. I'm not sure of anyone who has ever used the OrIterator/AndIterator for general purpose table schemas. But again, a quick unit test would likely be the proof in the puddin'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/237#discussion_r109517652
  
    --- Diff: core/src/test/java/org/apache/accumulo/core/iterators/user/CombinerTest.java ---
    @@ -922,4 +923,49 @@ public void testDeleteHandling() throws Exception {
         runDeleteHandlingTest(input, expected, null, paritalMajcIe, ".*ERROR.*SummingCombiner.*ACCUMULO-2232.*");
         runDeleteHandlingTest(input, expected, null, fullMajcIe, ".*ERROR.*SummingCombiner.*ACCUMULO-2232.*");
       }
    +
    +  /**
    +   * Tests the Lossy option will ignore errors in TypedValueCombiner. Uses SummingArrayCombiner to generate error.
    +   */
    +  @Test
    +  public void testLossyOption() throws IOException, IllegalAccessException, InstantiationException {
    +    Encoder<List<Long>> encoder = SummingArrayCombiner.VarLongArrayEncoder.class.newInstance();
    --- End diff --
    
    Is there a reason could not do `new VarLongArrayEncoder()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    >  Since you have experience with it, would you be able to write a test?
    
    I have the ability and knowledge, just can't guarantee the time :)
    
    IMO, leaving this iterator out of the `user` package was intended to note that it's not great for users to directly consume. I don't see any value added to slapping a "Deprecated" on it too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    There is also another version of OrIterator in the wikisearch repo: https://github.com/apache/accumulo-wikisearch/blob/master/query/src/main/java/org/apache/accumulo/examples/wikisearch/iterator/OrIterator.java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    \U0001f44d 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/accumulo/pull/237


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    @phrocker  do you think it should still call `sorted.add(TS)` when it removes TS from the iterator? The comments make it sound like maybe it shouldn't,  but I am not sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by phrocker <gi...@git.apache.org>.
Github user phrocker commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    @keith-turner I agree it seems to break the SKVI contract by trying to make some optimization that likely isn't of great benefit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    > MO, leaving this iterator out of the user package was intended to note that it's not great for users to directly consume. I don't see any value added to slapping a "Deprecated" on it too.
    
    Isn't that something that Deprecation does?  Tells users to use at your own risk.  Just having the class up in the "iterator" package as opposed to the "user" package doesn't portray this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/237#discussion_r109519640
  
    --- Diff: core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java ---
    @@ -53,30 +54,33 @@ private static long process(TreeMap<Key,Value> sourceMap, TreeMap<Key,Value> res
       public void test() throws IOException {
         TreeMap<Key,Value> sourceMap = new TreeMap<>();
         Value emptyValue = new Value("".getBytes());
    +    IteratorSetting iteratorSetting = new IteratorSetting(1, FirstEntryInRowIterator.class);
    +    FirstEntryInRowIterator.setNumScansBeforeSeek(iteratorSetting, 10);
    --- End diff --
    
    Will the test fail if this is not set to properly?  If not, could call iteratorSetting.getOptions() after this and check if it was really set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by phrocker <gi...@git.apache.org>.
Github user phrocker commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    @keith-turner I'll admit I'm still trying to wrap my head around the why. I'm trying to reverse engineer it to figure out if there's something I'm missing...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    Will merge today since it seems everyone is OK with the other changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    I haven't tried to use it, mainly is because its very confusing and not documented.  I do believe you that it worked at one point.  But no one has touched the code ever (other than formatting) and without a good test, its likely to rot if it hasn't already.  Since you have experience with it, would you be able to write a test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    On the PR for adding close to iterators we noticed the OrIterator seemed to be doing something incorrect.  However, without knowing what the iterators is supposed to actually do, we were not sure :) 
    
    Under some case in seek it removes an iterator and then later adds it.  Does anyone know whats going on here?
    
    https://github.com/milleruntime/accumulo/blob/4366d58e60b641f4ace075d7812fc947e2e18910/core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java#L225


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    > Isn't that something that Deprecation does?
    
    I'm not a fan of using deprecation tags to signal "YMMV". Not being in the public API does that all by itself. Currently, (for better or worse) no iterators are in the public API. Some are more risky than others, but I don't think we can use deprecation tags to meaningfully distinguish between continuous ranges of risk.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/237#discussion_r109529184
  
    --- Diff: core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java ---
    @@ -53,30 +54,33 @@ private static long process(TreeMap<Key,Value> sourceMap, TreeMap<Key,Value> res
       public void test() throws IOException {
         TreeMap<Key,Value> sourceMap = new TreeMap<>();
         Value emptyValue = new Value("".getBytes());
    +    IteratorSetting iteratorSetting = new IteratorSetting(1, FirstEntryInRowIterator.class);
    +    FirstEntryInRowIterator.setNumScansBeforeSeek(iteratorSetting, 10);
    --- End diff --
    
    Good idea. Added in 44262a314ce37d822ec96030872f10c90a8b18d8


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    > should still call sorted.add(TS) when it removes TS from the iterator?
    
    That does look like a bug to me. When it is heap-ifying each column (termsource), it's saying that for the given `Range` it was seek'ed to, there are no results. However, there is no reason that this instance of the `OrIterator` couldn't be seek'ed to a new `Range` without being torn-down.
    
    However, most of the time, I would guess that the above is not actually triggered (they would get a new instance of the iterator that would re-construct the term sources).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    If there is no opposition to deprecating OrIterator, I will merge these changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    > Isn't that something that Deprecation does? Tells users to use at your own risk
    
    Verbatim from JDK8
    
    "A program element annotated @Deprecated is one that programmers are discouraged from using, typically because it is dangerous, or because a better alternative exists. Compilers warn when a deprecated program element is used or overridden in non-deprecated code."
    
    Personally, I wouldn't call the OrIterator's possible code-rot/negligence, "dangerous". IIRC, the {{o.a.a.c.i.user}} package was introduced to bridge the gap between "Iterators we expect users to pull off the shelf" and "Iterators which YMMV using" (a nod towards SKVI not being in public API). I think the OrIterator's lack of love is adequately advertised by not being in {{o.a.a.c.i.user}}.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    > If there is no opposition to deprecating OrIterator
    
    I'm iffy on this. It seems like you're just deprecating it without replacement because it doesn't have any tests. Is this accurate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/237#discussion_r109529174
  
    --- Diff: core/src/test/java/org/apache/accumulo/core/iterators/user/CombinerTest.java ---
    @@ -922,4 +923,49 @@ public void testDeleteHandling() throws Exception {
         runDeleteHandlingTest(input, expected, null, paritalMajcIe, ".*ERROR.*SummingCombiner.*ACCUMULO-2232.*");
         runDeleteHandlingTest(input, expected, null, fullMajcIe, ".*ERROR.*SummingCombiner.*ACCUMULO-2232.*");
       }
    +
    +  /**
    +   * Tests the Lossy option will ignore errors in TypedValueCombiner. Uses SummingArrayCombiner to generate error.
    +   */
    +  @Test
    +  public void testLossyOption() throws IOException, IllegalAccessException, InstantiationException {
    +    Encoder<List<Long>> encoder = SummingArrayCombiner.VarLongArrayEncoder.class.newInstance();
    --- End diff --
    
    Whoops... weird copy and paste hold over. Fixed in 4262a314ce37d822ec96030872f10c90a8b18d8


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    Created a separate PR for OrIterator: https://github.com/apache/accumulo/pull/238


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    > Mostly... its also not clear whether it actually functions correctly or not
    
    Sorry to be pedantic -- again, is this because of the lack of unit tests? Or, did you try to use it and didn't have success?
    
    Backstory is that I have a lot experience with this iterator and, while it's been years, I know it did work. I can't think of any changes that would have caused it to not work -- this is why I'm asking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by phrocker <gi...@git.apache.org>.
Github user phrocker commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    @keith-turner it remove sources that don't match the term or don't have any values. Thus removing it from the TermSources. This seems reasonable based on dealing with a source as a term amongst a list of iterable sources. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
  
    Mostly... its also not clear whether it actually functions correctly or not. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---