You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by cestella <gi...@git.apache.org> on 2017/02/23 13:56:02 UTC

[GitHub] incubator-metron pull request #463: METRON-728: ReaderSpliteratorTest fails ...

GitHub user cestella opened a pull request:

    https://github.com/apache/incubator-metron/pull/463

    METRON-728: ReaderSpliteratorTest fails randomly and extremely rarely

    This unit test is sporadic, so it's less sporadic now.
    
    * I ran the medium batch tests in question 100k times and validated that it fails without the change at least once.
    * I ran the medium batch tests in question 100k time after the change and validated that it does not fail.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron METRON-728

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/463.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #463
    
----
commit 46e7df89210a608e936082e1fa1acb21447f92fd
Author: cstella <ce...@gmail.com>
Date:   2017-02-23T13:51:33Z

    METRON-728: ReaderSpliteratorTest fails randomly and extremely rarely

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    We actually need it for all of the tests that ensure actual parallelism and I've strengthened the other test impacted.  We had gotten around it by being too lenient, honestly.  The rest of the test cases are non-parallel, so they are correct as implemented.
    
    This is the correct fix, IMO.  `stream.parallel()` does not give any explicit guarantees about parallelism, just that it can parallelize batches across a thread pool.  This test essentially ensures that given enough tries, it does parallelize.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    Yep, whole test 100k times red-green style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    @cestella @justinleet Hm, those should theoretically be the same in this case. doAnswer() is just a different implementation for dealing with void return methods. http://static.javadoc.io/org.mockito/mockito-core/2.7.11/org/mockito/Mockito.html#doAnswer(org.mockito.stubbing.Answer)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    Why do we need this for only one of our tests?  Why are other tests not impacted by this issue?
    Do you think this is the best fix or a 'good-enough' band-aid?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    I reacted to your request, @justinleet .  Also, I would like to put out a request for the community:
    
    The `delegatingSpliterator` pattern that I used uses polymorphism to construct a delegate which allows me to count splits.  I wanted to investigate using mocks to do this in a cleaner way, but I kept getting off-by-one issues.  Here's what I did as a replacement for `delegatingSpliterator`
    
    ```
        Spliterator<String> delegatingSpliterator = spy(spliterator);
        when(delegatingSpliterator.trySplit()).thenAnswer(answer -> {
          Spliterator<String> ret = spliterator.trySplit();
            if(ret != null) {
              numSplits.incrementAndGet();
            }
            return ret;
        });
    ```
    
    What I see is that I am processing 1 fewer splits than I expect.  What we have in the PR is perfectly sufficient and inoffensive, but I wanted to learn a bit more about mockito and thought if anyone had insight, we could all get better. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    Yep, my +1 is still in place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    @cestella that is a much better way of stating it, and exactly what I was alluding to.  I'll look through the new commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    +1, I appreciate you going ahead and taking this ticket, given that I've been bitten by it twice now.  Looks great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    Thanks!  I don't quite get why that works and mine returns one fewer split though.  Anyone have an insight on that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    I figured out why
    
    From the docs:
    ```
    
       List list = new LinkedList();
       List spy = spy(list);
    
       //Impossible: real method is called so spy.get(0) throws IndexOutOfBoundsException (the list is yet empty)
       when(spy.get(0)).thenReturn("foo");
    
       //You have to use doReturn() for stubbing
       doReturn("foo").when(spy).get(0);
    ```
    So the first call to @cestella's trySplit() is in the when(), not in the lambda.  The subsequent calls are.  So everything shifts by one.
    
    As the docs note "Sometimes it's impossible or impractical to use when(Object) for stubbing spies. Therefore when using spies please consider doReturn|Answer|Throw() family of methods for stubbing."
    
    I just happened to try this, and now I know why it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #463: METRON-728: ReaderSpliteratorTest fails ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-metron/pull/463


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    Good catch @justinleet! That was my next suspicion and you beat me to it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    @mmiklavc The stream can consider itself parallel without it actually functioning in parallel.  If we have one big split, only one thread will be used in the threadpool, so it'll be de facto serial.  Also, we don't actually implement Stream, we *construct* a stream from a Spliterator.
    
    The issue that @justinleet alluded to was that we were testing `Spliterator` indirectly by testing `Stream`s in a threadpool.  If we just did the direct test, we'd narrow the scope and remove the variability as well.  If we are, then we should be able to trust `Stream.parallel()` to do the right thing rather than testing that.  What we really want to know is are we splitting properly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #463: METRON-728: ReaderSpliteratorTest fails ...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/463#discussion_r102776136
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/utils/file/ReaderSpliteratorTest.java ---
    @@ -97,88 +110,73 @@ public void testSequentialStreamLargeBatch() throws FileNotFoundException {
           Map<String, Integer> count =
                   stream.map(s -> s.trim())
                           .collect(Collectors.toMap(s -> s, s -> 1, Integer::sum));
    -      Assert.assertEquals(5, count.size());
    -      Assert.assertEquals(3, (int) count.get("foo"));
    -      Assert.assertEquals(2, (int) count.get("bar"));
    -      Assert.assertEquals(1, (int) count.get("and"));
    -      Assert.assertEquals(1, (int) count.get("the"));
    +      validateMapCount(count);
         }
       }
     
    -  @Test
    -  public void testActuallyParallel() throws ExecutionException, InterruptedException, FileNotFoundException {
    -    //With 9 elements and a batch of 2, we should only ceil(9/2) = 5 batches, so at most min(5, 2) = 2 threads will be used
    -    try( Stream<String> stream = ReaderSpliterator.lineStream(getReader(), 2)) {
    -      ForkJoinPool forkJoinPool = new ForkJoinPool(2);
    -      forkJoinPool.submit(() -> {
    -                Map<String, Integer> threads =
    -                        stream.parallel().map(s -> Thread.currentThread().getName())
    -                                .collect(Collectors.toMap(s -> s, s -> 1, Integer::sum));
    -                Assert.assertTrue(threads.size() <= 2);
    -              }
    -      ).get();
    -    }
    -  }
    +  private int getNumberOfBatches(final ReaderSpliterator spliterator) throws ExecutionException, InterruptedException {
    +    final AtomicInteger numSplits = new AtomicInteger(0);
    +    //we want to wrap the spliterator and count the (valid) splits
    +    Spliterator<String> delegatingSpliterator = new Spliterator<String>() {
    +      @Override
    +      public boolean tryAdvance(Consumer<? super String> action) {
    +        return spliterator.tryAdvance(action);
    +      }
     
    -  @Test
    -  public void testActuallyParallel_mediumBatch() throws ExecutionException, InterruptedException, FileNotFoundException {
    -    //With 9 elements and a batch of 2, we should only ceil(9/2) = 5 batches, so at most 5 threads of the pool of 10 will be used
    -    try( Stream<String> stream = ReaderSpliterator.lineStream(getReader(), 2)) {
    -      ForkJoinPool forkJoinPool = new ForkJoinPool(10);
    -      forkJoinPool.submit(() -> {
    -                Map<String, Integer> threads =
    -                        stream.parallel().map(s -> Thread.currentThread().getName())
    -                                .collect(Collectors.toMap(s -> s, s -> 1, Integer::sum));
    -                Assert.assertTrue(threads.size() <= (int) Math.ceil(9.0 / 2) && threads.size() > 1);
    -              }
    -      ).get();
    -    }
    +      @Override
    +      public Spliterator<String> trySplit() {
    +        Spliterator<String> ret = spliterator.trySplit();
    +        if(ret != null) {
    +          numSplits.incrementAndGet();
    +        }
    +        return ret;
    +      }
    +
    +      @Override
    +      public long estimateSize() {
    +        return spliterator.estimateSize();
    +      }
    +
    +      @Override
    +      public int characteristics() {
    +        return spliterator.characteristics();
    +      }
    +    };
    +
    +    Stream<String> stream = StreamSupport.stream(delegatingSpliterator, true);
    +
    +    //now run it in a parallel pool and do some calculation that doesn't really matter.
    +    ForkJoinPool forkJoinPool = new ForkJoinPool(10);
    --- End diff --
    
    Incredibly minor point, but since we no longer care about the actual execution and aren't running it a lot, it seems appropriate to just use ForkJoinPool.commonPool(), and drop the shutdown line.
    
    This is entirely up to you if you want to change, I don't consider it blocking by any means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    I'm +1 on this latest incarnation. Pulled the branch locally, reviewed, and verified metron-common `mvn clean install` runs successfully. Thanks for taking this on @cestella. The code is cleaner and the intent is clear to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    @cestella Spy() syntax ends up working differently than mock() from what I can tell.
    
    This worked for me
    ```
        Spliterator<String> delegatingSpliterator = spy(spliterator);
        doAnswer(invocationOnMock -> {
          Spliterator<String> ret = spliterator.trySplit();
          if(ret != null) {
            numSplits.incrementAndGet();
          }
          return ret;
        }).when(delegatingSpliterator).trySplit();
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    @justinleet Thanks!  I replaced the polymorphic delegate with the cleaner mock version *and* I learned a bit about how mocking works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    Nevermind, I can't read.  You ran the whole test 100k times, correct?  I'm fine with that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    @cestella The more I'm thinking about this, the more I wonder if this test is inherently structured incorrectly. My thinking is that it seems more like we're testing whether or not a stream can run in parallel, rather than that the stream produced by the spliterator meets the appropriate contracts for a stream.
    
    Is there a way to restructure this so that it just tests "Does this meet the criteria of a Java stream?", rather than "Can a stream in Java run in parallel?"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    Yeah, I can get behind that, @justinleet I'd state it slightly differently though.  It's not that we want to test the Stream so much as we want to test that the Spliterator, when interacting with a stream, makes the right number of splits.  That was the core problem with `Files.lines()` was that it made one split, not many splits.
    
    What we really want to test here is that the spliterator splits the expected number of times.  What we have indirectly tests that, but it relies on the ForkPool to work in a consistent way, which isn't going to happen.  Anyway, I pushed a commit, see how you like it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    Are we settling on "less sporadic"?  Like I noted in the ticket, I had the original test run for over a minute (~90 seconds) before the JVM decided to actually be single threaded.  It's not the usual case but I probably only ran it 20 or so times before I hit the 90 second case.
    
    It seems more likely to fail in Travis, which is fine, but I'm not sure I want my local build failing that often.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #463: METRON-728: ReaderSpliteratorTest fails randoml...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/463
  
    > We actually need it for all of the tests that ensure actual parallelism and I've strengthened the other test impacted. We had gotten around it by being too lenient, honestly. The rest of the test cases are non-parallel, so they are correct as implemented.
    
    > This is the correct fix, IMO. stream.parallel() does not give any explicit guarantees about parallelism, just that it can parallelize batches across a thread pool. This test essentially ensures that given enough tries, it does parallelize.
    
    Ok, I follow what's going on here and support the need to test for instances where you may end up with the same thread by looping. 
    
    Just for some clarification, what is the reason that we are not leveraging stream.isParallel() to check for parallelism? Would the concern here be that because we're returning an object the implements the Stream interface, the ReaderSpliterator could be changed to return something that  does not abide by the isParallel() contract?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---