You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/01 07:23:15 UTC

[GitHub] [pulsar] lhotari commented on pull request #9393: Fix testBrokerSelectionForAntiAffinityGroup by increasing OverloadedThreshold

lhotari commented on pull request #9393:
URL: https://github.com/apache/pulsar/pull/9393#issuecomment-770633183


   Nice work on this @michaeljmarshall . 
   
   However, it seems that the flakiness remains after this change.
   
   Sometimes it's hard to reproduce the flaky test failures locally. One thing that seems to be a common theme is that the flaky test failures happen in CI, but can be hard to produce in local environments. While working on the fix for flaky test MessageIdTest, I found a way to reproduce some flaky test failures effectively by limiting the CPU resources to somewhat similar that there is in CI. The CI tests run on an Azure VM that has 2 cores and about 6GB free RAM (IIRC). 
   
   Since I use Linux for development, the easiest way for me to constraint the resources of the test run was to use Docker.
   
   These are the commands I used to test this change:
   ```
   $ gh pr checkout 9393
   $ mvn clean install -DskipTests -Dspotbugs.skip=true
   $ docker run --cpus=2 --memory=6g -u 1000:1000 --net=host -it --rm -v $HOME:$HOME -w $PWD -v /etc/passwd:/etc/passwd:ro ubuntu bash -c 'source "$HOME/.sdkman/bin/sdkman-init.sh"; counter=0; while mvn -Pcore-modules -pl pulsar-broker test -DfailIfNoTests=false -Dtest=AntiAffinityNamespaceGroupTest -DredirectTestOutputToFile=false -DtestRetryCount=0; do echo "----------- LOOP $counter ---------------"; ((counter++)); done; echo "LOOP $counter"' | tee docker_output_`date +%s`.log
   ```
   
   here's the output: https://gist.github.com/lhotari/7d8c7ae0a9e1a26d92599c585ba64e13
   and pulsar-broker/target/surefire-reports/testng-results.xml  https://gist.github.com/lhotari/fbbcd1405d8f16be7106f8ece9f66084
   
   ```
   java.lang.AssertionError: did not expect [localhost:42919] but found [localhost:42919]
   at org.testng.Assert.fail(Assert.java:99)
   at org.testng.Assert.failEquals(Assert.java:1041)
   at org.testng.Assert.assertNotEqualsImpl(Assert.java:147)
   at org.testng.Assert.assertNotEquals(Assert.java:1531)
   at org.testng.Assert.assertNotEquals(Assert.java:1535)
   at org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup(AntiAffinityNamespaceGroupTest.java:427)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
   at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
   at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
   at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   ```
   
   What comes into mind is that the test might start before both brokers are available. 
   Something like https://github.com/apache/pulsar/blob/24f759c677bfe7cbb2228cab8a38f2ebd0893945/pulsar-discovery-service/src/test/java/org/apache/pulsar/discovery/service/DiscoveryServiceTest.java#L251-L252 could help with that?
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org