You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/19 04:20:58 UTC

[GitHub] [pulsar] lhotari opened a new issue #9622: Flaky-test: CLITest (and possibly a lot of other integration tests)

lhotari opened a new issue #9622:
URL: https://github.com/apache/pulsar/issues/9622


   CLITest is flaky. It fails when test retries are disabled. This problem can be reproduced by running CLITest in IntelliJ.
   **It seems that the same root cause is causing test failures in other integration tests**.
   
   Bookies die in the test after this kind of error message `failed to allocate 16777216 byte(s) of direct memory (used: 536870912, max: 536870912)`.
   
   To investigate the issue I set the environment variable [`TESTCONTAINERS_RYUK_DISABLED=true`](https://www.testcontainers.org/features/configuration/#disabling-ryuk) to disable TestContainers automatic container cleanup. I also locally temporarily added these lines to `PulsarCluster.stop` method:
   ```
        public synchronized void stop() {
   +        boolean leaveContainersRunning = Boolean.parseBoolean(System.getenv("TESTCONTAINERS_RYUK_DISABLED"));
   +        if (leaveContainersRunning) {
   +            log.warn("Pulsar cluster is left running since TESTCONTAINERS_RYUK_DISABLED=true.");
   +            return;
   +        }
   ```
   After this, it's possible to use `docker exec -it CLITest-euiia-pulsar-bookie-0 bash` and get a shell to view the `/var/log/pulsar/bookie.log` file which is created by the [`tests/docker-images/latest-version-image/conf/bookie.conf`](https://github.com/apache/pulsar/blob/master/tests/docker-images/latest-version-image/conf/bookie.conf) config used to run bookies in the `apachepulsar/pulsar-test-latest-version` docker image. This file contained the following error message:
   
   ```
   08:26:53.194 [SyncThread-7-1] INFO  org.apache.bookkeeper.bookie.EntryLogManagerBase - Creating a new entry log fi
   le because current active log channel has not initialized yet
   08:26:53.195 [SyncThread-7-1] ERROR org.apache.bookkeeper.proto.BookieServer - Unable to allocate memory, exiting 
   bookie
   io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 5368709
   12, max: 536870912)
   ```
   
   After restarting the bookie service with `supervisorctl start bookie`, the bookie starts again and I was able to check the `java` process command line:
   ```
   /usr/local/openjdk-8/bin/java -cp /pulsar/conf:::/pulsar/lib/*: -Dlog4j.configurationFile=log4j2.yaml -Djute.maxbuffer=10485760 -Djava.net.preferIPv4Stack=true -Xmx128M -XX:MaxDirectMemorySize=512M -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024 -Dpulsar.log.appender=RoutingAppender -Dpulsar.log.dir=/pulsar/logs -Dpulsar.log.level=info -Dpulsar.log.root.level=info -Dpulsar.routing.appender.default=Console -Dlog4j2.is.webapp=false -Dpulsar.functions.process.container.log.dir=/pulsar/logs -Dpulsar.functions.java.instance.jar=/pulsar/instances/java-instance.jar -Dpulsar.functions.python.instance.file=/pulsar/instances/python-instance/python_instance_main.py -Dpulsar.functions.extra.depe
 ndencies.dir=/pulsar/instances/deps -Dpulsar.functions.instance.classpath=/pulsar/conf:::/pulsar/lib/*: -Dpulsar.log.file=bookkeeper.log org.apache.bookkeeper.server.Main --conf /pulsar/conf/bookkeeper.conf
   ```
   
   I can see that `-XX:MaxDirectMemorySize=512M` is properly passed. I noticed that `PULSAR_GC` isn't used from [`tests/docker-images/latest-version-image/conf/bookie.conf`](https://github.com/apache/pulsar/blob/master/tests/docker-images/latest-version-image/conf/bookie.conf) so I created #9621 to fix that. However that's not the reason for the flakiness.
   
   In [`tests/docker-images/latest-version-image/conf/bookie.conf`](https://github.com/apache/pulsar/blob/cf63ae8480e6b03aca437b658cc10a935129a819/tests/docker-images/latest-version-image/conf/bookie.conf#L25) there is configuration to set `dbStorage_writeCacheMaxSizeMb="16",dbStorage_readAheadCacheMaxSizeMb="16"` . However this is a no-op since `apply-config-from-env.py` script is called in 
   [`tests/docker-images/latest-version-image/scripts/run-bookie.sh`](https://github.com/apache/pulsar/blob/cf63ae8480e6b03aca437b658cc10a935129a819/tests/docker-images/latest-version-image/scripts/run-bookie.sh#L21-L22) . There is nothing that will put the environment variables defined in the supervisord config to `conf/bookkeeper.conf`. Therefore, the configuration for `dbStorage_*` should be directly in the `run-bookie.sh` script to fix the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #9622: Flaky-test: CLITest (and possibly a lot of other integration tests because of the same reason)

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #9622:
URL: https://github.com/apache/pulsar/issues/9622#issuecomment-781810418


   There's PR #9623 to fix this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat closed issue #9622: Flaky-test: CLITest (and possibly a lot of other integration tests because of the same reason)

Posted by GitBox <gi...@apache.org>.
merlimat closed issue #9622:
URL: https://github.com/apache/pulsar/issues/9622


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org