You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by nickwallen <gi...@git.apache.org> on 2018/05/15 14:34:46 UTC

[GitHub] metron pull request #1015: METRON-1544 Flaky test: org.apache.metron.stellar...

GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/1015

    METRON-1544 Flaky test: org.apache.metron.stellar.common.CachingStell…

    This fixes the `CachingStellarProcessorTest` which is failing intermittently.
    
    ```
    Running org.apache.metron.stellar.common.CachingStellarProcessorTest
    Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.054 sec <<< FAILURE! - in org.apache.metron.stellar.common.CachingStellarProcessorTest
    testCaching(org.apache.metron.stellar.common.CachingStellarProcessorTest) Time elapsed: 0.053 sec <<< FAILURE!
    java.lang.AssertionError: expected:<6> but was:<4>
     at org.junit.Assert.fail(Assert.java:88)
     at org.junit.Assert.failNotEquals(Assert.java:834)
     at org.junit.Assert.assertEquals(Assert.java:645)
     at org.junit.Assert.assertEquals(Assert.java:631)
     at org.apache.metron.stellar.common.CachingStellarProcessorTest.testCaching(CachingStellarProcessorTest.java:73)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
     at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
     at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
     at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
     at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
     at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
     at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
     at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
     at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
     at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
     at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
    ```
    
    ### Changes
    
    * Added a property `stellar.cache.record.stats` that allows recording of cache usage stats. 
    
    * Updated the tests to use the stats produced by Caffeine.
    
    
    ## Pull Request Checklist
    
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
    - [ ] Have you written or updated unit tests and or integration tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1544

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1015
    
----
commit 4c6c970ad9abdac9a0f9b52851c6b73688d3059d
Author: Nick Allen <ni...@...>
Date:   2018-05-15T14:31:21Z

    METRON-1544 Flaky test: org.apache.metron.stellar.common.CachingStellarProcessorTest#testCaching

----


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    Is there not some relevant Caffeine test we can ape for this?


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    In the latest, I removed the problematic test case.  There is no need to test the eviction policy of the underlying cache, which is effectively what is happening.  If anyone can find gaps in test coverage, please let me know.


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    +1 for sure


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    Ok, now its really working.  
    
    The second problem was that there is some variability in which items get expired from the cache according to the LRU algorithm.  In about 1/250 runs the algorithm would not choose the expected item to expire.  I imagine this is only a problem when running at such small scale with a cache size of 2.
    
    I updated the test case so that it looks into the cache to determine whether a hit should occur or not.  The test passes, but I am not sure if the test is really useful.


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by ben-manes <gi...@git.apache.org>.
Github user ben-manes commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    Cache eviction is performed asynchronously by default using the FJP common pool. For testing you can set `executor(Runnable::run)` and it will be on the calling thread. The penalty is small so it’s sometimes used in practice anyway.
    
    The eviction is not LRU and has a random seed to guard against HashDoS attacks. This means that which entry is not predictable without forcing the seed, which is not exposed. Typically only the cache’s test should do that. You’re code shouldn’t make assumptions on what is evicted as we will try to maximize the hit rate using various techniques.


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    Daft.  Still a problem.  In master, run repetitively the test fails easily 3/10 times for me.  With the latest in this PR, it is failing 1/40 times or so, so I'm still not there.
    
    So even explicitly performing cache maintenance does not mean that the cache size will always be exactly what we expect.
    
    I am wondering if we even need such low level validation as what we have in testWithFullCache() now.


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    Can you call out the lines relevant to the fix for reviewers?


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    > @mmiklavc: Can you call out the lines relevant to the fix for reviewers?
    
    I provided some more color in the description this time.  Also, copied below.
    
    > The core problem is that the test makes assumptions about the cache state which are not always true. Specifically, even though the cache max size is 2, the cache can exceed that size for short periods of time until cache maintenance is performed.
    
    > Since the cache can be larger than expected for brief periods of time, the expectations of what is a hit or miss were incorrect. This is why the test would fail intermittently.
    
    > To address this, I rewrote the test so that the cache stats are validated after each expression is run and I also perform cache maintenance between each execution and validation of cache state.
    
    > Without the additional logging, the ability to grab more detailed cache stats, and breaking down that test case, I wouldn't have been able to uncover the bug.


---

[GitHub] metron pull request #1015: METRON-1544 Flaky test: org.apache.metron.stellar...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen closed the pull request at:

    https://github.com/apache/metron/pull/1015


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    lgtm, +1


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    @mmiklavc bump. Did I answer your question?  How is this looking?


---

[GitHub] metron pull request #1015: METRON-1544 Flaky test: org.apache.metron.stellar...

Posted by nickwallen <gi...@git.apache.org>.
GitHub user nickwallen reopened a pull request:

    https://github.com/apache/metron/pull/1015

    METRON-1544 Flaky test: org.apache.metron.stellar.common.CachingStell…

    This fixes the `CachingStellarProcessorTest` which is failing intermittently.
    
    ```
    Running org.apache.metron.stellar.common.CachingStellarProcessorTest
    Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.054 sec <<< FAILURE! - in org.apache.metron.stellar.common.CachingStellarProcessorTest
    testCaching(org.apache.metron.stellar.common.CachingStellarProcessorTest) Time elapsed: 0.053 sec <<< FAILURE!
    java.lang.AssertionError: expected:<6> but was:<4>
     at org.junit.Assert.fail(Assert.java:88)
     at org.junit.Assert.failNotEquals(Assert.java:834)
     at org.junit.Assert.assertEquals(Assert.java:645)
     at org.junit.Assert.assertEquals(Assert.java:631)
     at org.apache.metron.stellar.common.CachingStellarProcessorTest.testCaching(CachingStellarProcessorTest.java:73)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
     at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
     at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
     at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
     at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
     at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
     at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
     at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
     at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
     at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
     at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
    ```
    
    ### Changes
    
    * Added a property `stellar.cache.record.stats` that allows recording of cache usage stats. 
    
    * Updated the tests to use the stats produced by Caffeine.
    
    
    ## Pull Request Checklist
    
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [x] Have you included steps or a guide to how the change may be verified and tested manually?
    - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
    - [x] Have you written or updated unit tests and or integration tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1544

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1015
    
----
commit 4c6c970ad9abdac9a0f9b52851c6b73688d3059d
Author: Nick Allen <ni...@...>
Date:   2018-05-15T14:31:21Z

    METRON-1544 Flaky test: org.apache.metron.stellar.common.CachingStellarProcessorTest#testCaching

commit 3d023d7565d0e55973c3184ba29f3c6d833bb5da
Author: Nick Allen <ni...@...>
Date:   2018-05-15T14:52:12Z

    Added additional clarification around how only variables used in the expression are used to build the cache key

----


---

[GitHub] metron pull request #1015: METRON-1544 Flaky test: org.apache.metron.stellar...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen closed the pull request at:

    https://github.com/apache/metron/pull/1015


---

[GitHub] metron pull request #1015: METRON-1544 Flaky test: org.apache.metron.stellar...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/metron/pull/1015


---

[GitHub] metron pull request #1015: METRON-1544 Flaky test: org.apache.metron.stellar...

Posted by nickwallen <gi...@git.apache.org>.
GitHub user nickwallen reopened a pull request:

    https://github.com/apache/metron/pull/1015

    METRON-1544 Flaky test: org.apache.metron.stellar.common.CachingStell…

    This fixes the `CachingStellarProcessorTest` which is failing intermittently.
    
    ```
    Running org.apache.metron.stellar.common.CachingStellarProcessorTest
    Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.054 sec <<< FAILURE! - in org.apache.metron.stellar.common.CachingStellarProcessorTest
    testCaching(org.apache.metron.stellar.common.CachingStellarProcessorTest) Time elapsed: 0.053 sec <<< FAILURE!
    java.lang.AssertionError: expected:<6> but was:<4>
     at org.junit.Assert.fail(Assert.java:88)
     at org.junit.Assert.failNotEquals(Assert.java:834)
     at org.junit.Assert.assertEquals(Assert.java:645)
     at org.junit.Assert.assertEquals(Assert.java:631)
     at org.apache.metron.stellar.common.CachingStellarProcessorTest.testCaching(CachingStellarProcessorTest.java:73)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
     at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
     at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
     at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
     at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
     at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
     at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
     at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
     at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
     at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
     at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
    ```
    
    ### Changes
    
    * Added a property `stellar.cache.record.stats` that allows recording of cache usage stats. 
    
    * Updated the tests to use the stats produced by Caffeine.
    
    
    ## Pull Request Checklist
    
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [x] Have you included steps or a guide to how the change may be verified and tested manually?
    - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
    - [x] Have you written or updated unit tests and or integration tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1544

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1015
    
----
commit 4c6c970ad9abdac9a0f9b52851c6b73688d3059d
Author: Nick Allen <ni...@...>
Date:   2018-05-15T14:31:21Z

    METRON-1544 Flaky test: org.apache.metron.stellar.common.CachingStellarProcessorTest#testCaching

commit 3d023d7565d0e55973c3184ba29f3c6d833bb5da
Author: Nick Allen <ni...@...>
Date:   2018-05-15T14:52:12Z

    Added additional clarification around how only variables used in the expression are used to build the cache key

commit 928b924eb38b50dc1d655f929a7dbe4b2925a21e
Author: Nick Allen <ni...@...>
Date:   2018-05-15T20:15:58Z

    Needed to ensure cache maintenence was performed, otherwise assumptions made in the test about the cache state can be incorrect

----


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    Triggering the CI build to ensure tests don't fail.


---

[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1015
  
    > @ben-manes You’re code shouldn’t make assumptions on what is evicted as we will try to maximize the hit rate using various techniques.
    
    That makes a ton of sense.  Thanks for the tips.


---