You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/11/18 21:55:07 UTC

[GitHub] [accumulo] ctubbsii opened a new issue #1789: Broken test: LargestFirstMemoryManagerTest

ctubbsii opened a new issue #1789:
URL: https://github.com/apache/accumulo/issues/1789


   **Test name(s)**
    - org.apache.accumulo.tserver.LargestFirstMemoryManagerTest.test
   
   **Describe the failure observed**
   ```java
   java.lang.AssertionError: expected:<1;f<> but was:<1;e<>
   	at org.apache.accumulo.tserver.LargestFirstMemoryManagerTest.test(LargestFirstMemoryManagerTest.java:117)
   ```
   
   **Testing Environment:**
    - Version of this project: 2.1.0-SNAPSHOT
    - First commit known to fail (or current commit): 5cb30ed0e9a7f3d481ea49dd126d64cc7ab9ef53
    - Jenkins shows:
   ```
   Executing Maven:  -B -f /home/jenkins/workspace/Accumulo/main/pom.xml -V -e -ntp clean verify -Dstyle.color=always -Dfailsafe.rerunFailingTestsCount=5 -Dsurefire.forkCount=2C
   Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
   Maven home: /usr/local/asfpackages/maven/apache-maven-3.6.3
   Java version: 11.0.6, vendor: Oracle Corporation, runtime: /usr/local/asfpackages/java/jdk-11.0.6
   Default locale: en_US, platform encoding: ISO-8859-1
   OS name: "linux", version: "4.15.0-74-generic", arch: "amd64", family: "unix"
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-778170679


   There could be an issue with the timing of the `LargestFirstMemoryManager` class being loaded by the JVM long before (perhaps by another test since that class is used quite frequently) the test class is loaded. I don't know how we could fix that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-780109240


   > > When the test failed in #1928, it did so after 3.2 seconds. So, I'm not sure if this would be a timing issue.
   > 
   > Ah OK. You wouldn't know if the class was loaded at the same time the test started up would you? I don't know enough about the maven test plugin to know when it loads classes and when the timer on the test would start.
   
   We have maven-surefire-plugin configured to not reuse forks, so it should have been a fresh JVM, and the class loaded at the start of the test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-778201865


   > We use `<reuseForks>false</reuseForks>` in `maven-surefire-plugin`, so it shouldn't use the same JVM as another test, and shouldn't be affected by another test.
   
   OK cool. So maybe just the timeout will suffice.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-729992592


   > This test passes for me locally.
   
   Me too... however, this part of the code is important enough that it makes sense to take a closer look to see if we can figure out which circumstances could have caused this and to address it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-729988483


   This test passes for me locally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-778183201


   We use `<reuseForks>false</reuseForks>` in `maven-surefire-plugin`, so it shouldn't use the same JVM as another test, and shouldn't be affected by another test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-778166173


   I have been looking at this test for #1899 so I can try and fix it there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-785226596


   After some experimenting with the ZERO_TIME value of the test and printing different values coming out of the `LargestFirstMemoryManager`, I think I have figured out how the test occasionally fails. As to exactly how all the numbers at play come to cause it to fail, is another story. Here are the strange circumstances I created to lead to the failure.
   
   I modified the ZERO_TIME of the test to be _exactly_ 4 minutes less than the `LargestFirstMemoryManager`:
   <pre>
   public class LargestFirstMemoryManagerTest {
   
     private static final long ZERO = LargestFirstMemoryManager.ZERO_TIME - (4 * 60 * 1000);
   </pre>
   
   This will cause the test to always fail on my machine at the same spot we have seen above and with the exact same AssertionError message:
   <pre>
   [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.008 s <<< FAILURE! - in org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest
   [ERROR] org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest.test  Time elapsed: 0.93 s  <<< FAILURE!
   java.lang.AssertionError: expected:<1;f<> but was:<1;e<>
   	at org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest.test(LargestFirstMemoryManagerTest.java:118)
   </pre>
   
   The weird timing difference combined with the following oddities will lead to this failure:
   - The math used in the `LargestFirstMemoryManager` to calculate timeMemoryLoad, which is used as a key in the `LargestMap` largestMemTablets
   - The fact that the `LargestMap` inner class uses an `ArrayList` to store multiple tablets keyed at the the same Long
   - The final loop in the `LargestFirstMemoryManager` will end as soon as it hits the desired number of tablets
   
   This can be seen in these log statements:
   <pre>
   2021-02-24T11:15:57,223 [memory.LargestFirstMemoryManager] INFO : largestMemTablets.put(timeMemoryLoad 446267447, 1;a<)
   2021-02-24T11:15:57,224 [memory.LargestFirstMemoryManager] INFO : largestMemTablets.put(timeMemoryLoad 892534895, 1;b<)
   2021-02-24T11:15:57,224 [memory.LargestFirstMemoryManager] INFO : largestMemTablets.put(timeMemoryLoad 892534896, 1;c<)
   2021-02-24T11:15:57,224 [memory.LargestFirstMemoryManager] INFO : largestMemTablets.put(timeMemoryLoad 892534897, 1;d<)
   2021-02-24T11:15:57,225 [memory.LargestFirstMemoryManager] INFO : <b>largestMemTablets.put(timeMemoryLoad 892534898, 1;e<)</b>
   2021-02-24T11:15:57,225 [memory.LargestFirstMemoryManager] INFO : <b>largestMemTablets.put(timeMemoryLoad 892534898, 1;f<)</b>
   2021-02-24T11:15:57,225 [memory.LargestFirstMemoryManager] INFO : <b>adding 1;f< to list for 892534898</b>
   2021-02-24T11:15:57,226 [memory.LargestFirstMemoryManager] INFO : largestMemTablets.put(timeMemoryLoad 892534899, 1;g<)
   2021-02-24T11:15:57,226 [memory.LargestFirstMemoryManager] INFO : Last entry 892534899 has list of size = 1
   2021-02-24T11:15:57,226 [memory.LargestFirstMemoryManager] INFO : COMPACTING 1;g<  total = 6,979,321,877 ingestMemory = 6,979,321,877
   2021-02-24T11:15:57,227 [memory.LargestFirstMemoryManager] INFO : chosenMem = 1,073,741,830 chosenIT = -240.00 load 892,534,899
   2021-02-24T11:15:57,227 [memory.LargestFirstMemoryManager] INFO : <b>Last entry 892534898 has list of size = 2</b>
   2021-02-24T11:15:57,227 [memory.LargestFirstMemoryManager] INFO : COMPACTING 1;e<  total = 6,979,321,877 ingestMemory = 6,979,321,877
   2021-02-24T11:15:57,228 [memory.LargestFirstMemoryManager] INFO : chosenMem = 1,073,741,828 chosenIT = -240.00 load 892,534,898
   2021-02-24T11:15:57,228 [memory.LargestFirstMemoryManager] INFO : returning tabletsToMinorCompact with 2 tablets
   </pre>
   
   During a failure, the final loop in `LargestFirstMemoryManager` will take the next tablet in the list of the lastEntry, which may or may not be the next largest tablet since the backing `ArrayList` of `LargestMap` is not sorted. During a success, each entry has an `ArrayList` of size one, stored at distinct Long keys.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-784482418


   Multiple tests are running by default (1C), but they shouldn't be using the same JVM. So, they should be independent, at least with respect to Java VM resources (they might still compete with OS/environment resources).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime closed issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime closed issue #1789:
URL: https://github.com/apache/accumulo/issues/1789


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-778445238


   When the test failed in #1928, it did so after 3.2 seconds. So, I'm not sure if this would be a timing issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-778165056


   There are timing elements to the `LargestFirstMemoryManager` that I think will cause it to behave differently if the test takes too long to run. For instance, there is a variable that is clocked when the class is loaded (`ZERO_TIME`) that is used to compare for compaction idle times. The value here could become over 15 minutes (`getMinCIdleThreshold()`one of the overridden methods returns):
   https://github.com/apache/accumulo/blob/906aef9101b04c956dee71625590d79fdc7d97d4/server/tserver/src/main/java/org/apache/accumulo/tserver/memory/LargestFirstMemoryManager.java#L184
   The commit times in the test are always 0, so the idle time is going to compare against when the class was loaded.
   
   The test shouldn't take longer than 1 minute (it takes just over 1 second on my machine) so if there is a way we can add a timeout, we should.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime edited a comment on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime edited a comment on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-784366475


   I saw this happen during a failure:
   
   <pre>
   [INFO] Running org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest
   [INFO] Running org.apache.accumulo.tserver.TabletResourceManagerTest
   [ERROR] WARNING: An illegal reflective access operation has occurred
   [ERROR] WARNING: Illegal reflective access by org.easymock.cglib.core.ReflectUtils$1 (file:/home/runner/.m2/repository/org/easymock/easymock/4.2/easymock-4.2.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
   [ERROR] WARNING: Please consider reporting this to the maintainers of org.easymock.cglib.core.ReflectUtils$1
   [ERROR] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
   [ERROR] WARNING: All illegal access operations will be denied in a future release
   [ERROR] WARNING: An illegal reflective access operation has occurred
   [ERROR] WARNING: Illegal reflective access by org.easymock.cglib.core.ReflectUtils$1 (file:/home/runner/.m2/repository/org/easymock/easymock/4.2/easymock-4.2.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
   [ERROR] WARNING: Please consider reporting this to the maintainers of org.easymock.cglib.core.ReflectUtils$1
   [ERROR] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
   [ERROR] WARNING: All illegal access operations will be denied in a future release
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.674 s - in org.apache.accumulo.tserver.TabletResourceManagerTest
   [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.007 s <<< FAILURE! - in org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest
   [ERROR] org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest.test  Time elapsed: 1.721 s  <<< FAILURE!
   java.lang.AssertionError: expected:<1;f<> but was:<1;e<>
   	at org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest.test(LargestFirstMemoryManagerTest.java:118)
   </pre>
   
   I don't know what is going on but it appears that multiple tests are running at the same time and the other test completes before the LargestFirstMemoryManagerTest. [Link](https://github.com/milleruntime/accumulo/runs/1963087985?check_suite_focus=true)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime edited a comment on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime edited a comment on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-778165056


   There are timing elements to the `LargestFirstMemoryManager` that I think will cause it to behave differently if the test takes too long to run. For instance, there is a variable that is clocked when the class is loaded (`ZERO_TIME`) that is used to compare for compaction idle times. The value here could become over 15 minutes (`getMinCIdleThreshold()`one of the overridden methods returns):
   https://github.com/apache/accumulo/blob/906aef9101b04c956dee71625590d79fdc7d97d4/server/tserver/src/main/java/org/apache/accumulo/tserver/memory/LargestFirstMemoryManager.java#L184
   The commit times in the test are set to the time the class is loaded, so the idle time can vary, depending when classes were loaded.
   
   The test shouldn't take longer than 1 minute (it takes just over 1 second on my machine) so if there is a way we can add a timeout, we should.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-784366475


   I saw this happen during a failure:
   
   <pre>
   [INFO] Running org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest
   [INFO] Running org.apache.accumulo.tserver.TabletResourceManagerTest
   [ERROR] WARNING: An illegal reflective access operation has occurred
   [ERROR] WARNING: Illegal reflective access by org.easymock.cglib.core.ReflectUtils$1 (file:/home/runner/.m2/repository/org/easymock/easymock/4.2/easymock-4.2.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
   [ERROR] WARNING: Please consider reporting this to the maintainers of org.easymock.cglib.core.ReflectUtils$1
   [ERROR] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
   [ERROR] WARNING: All illegal access operations will be denied in a future release
   [ERROR] WARNING: An illegal reflective access operation has occurred
   [ERROR] WARNING: Illegal reflective access by org.easymock.cglib.core.ReflectUtils$1 (file:/home/runner/.m2/repository/org/easymock/easymock/4.2/easymock-4.2.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
   [ERROR] WARNING: Please consider reporting this to the maintainers of org.easymock.cglib.core.ReflectUtils$1
   [ERROR] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
   [ERROR] WARNING: All illegal access operations will be denied in a future release
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.674 s - in org.apache.accumulo.tserver.TabletResourceManagerTest
   [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.007 s <<< FAILURE! - in org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest
   [ERROR] org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest.test  Time elapsed: 1.721 s  <<< FAILURE!
   java.lang.AssertionError: expected:<1;f<> but was:<1;e<>
   	at org.apache.accumulo.tserver.memory.LargestFirstMemoryManagerTest.test(LargestFirstMemoryManagerTest.java:118)
   </pre>
   
   I don't know what is going on but it appears that multiple tests are running at the same time and the other test completes before the LargestFirstMemoryManagerTest.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-780029591


   > When the test failed in #1928, it did so after 3.2 seconds. So, I'm not sure if this would be a timing issue.
   
   Ah OK. You wouldn't know if the class was loaded at the same time the test started up would you? I don't know enough about the maven test plugin to know when it loads classes and when the timer on the test would start.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1789: Broken test: LargestFirstMemoryManagerTest

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1789:
URL: https://github.com/apache/accumulo/issues/1789#issuecomment-778359835


   Added a timeout in 0d5ea4b51e5f781c35280789d281817d4bb730ee so I guess we will see if we get the timeout, then we will know its probably a timing issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org