You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Ewen Cheslack-Postava (JIRA)" <ji...@apache.org> on 2017/07/18 22:47:00 UTC

[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

    [ https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092286#comment-16092286 ] 

Ewen Cheslack-Postava commented on KAFKA-5608:
----------------------------------------------

It looks like the JmxTool really did take too long to generate output:

{quote}
[INFO  - 2017-07-18 14:25:44,820 - kafka - bootstrap_servers - lineno:549]: Bootstrap client port is: 9095
[DEBUG - 2017-07-18 14:25:44,820 - console_consumer - _worker - lineno:255]: Console consumer 1 command: export JMX_PORT=9192; export LOG_DIR=/mnt/console_consumer/logs; export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/console_consumer/tools-log4j.properties"; export KAFKA_OPTS="-Djava.security.auth.login.config=/mnt/security/jaas.conf -Djava.security.krb5.conf=/mnt/security/krb5.conf"; /opt/kafka-dev/bin/kafka
-console-consumer.sh --topic test_topic --consumer.config /mnt/console_consumer/console_consumer.properties --bootstrap-server worker11:9095,worker8:9095,worker4:9095 --isolation-level read_uncommitted --from-beginning --timeout-ms 60000 --formatter kafka.tools.LoggingMessageFormatter 2>> /mnt/console_consumer/console_consumer.stderr | tee -a /mnt/console_consumer/console_consumer.stdout &
[DEBUG - 2017-07-18 14:25:44,820 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: export JMX_PORT=9192; export LOG_DIR=/mnt/console_consumer/logs; export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/console_consumer/tools-log4j.properties"; export KAFKA_OPTS="-Djava.security.auth.login.config=/mnt/security/jaas.conf -Djava.security.krb5.conf=/mnt/security/krb5.conf"; /opt/kafka-dev/bin/ka
fka-console-consumer.sh --topic test_topic --consumer.config /mnt/console_consumer/console_consumer.properties --bootstrap-server worker11:9095,worker8:9095,worker4:9095 --isolation-level read_uncommitted --from-beginning --timeout-ms 60000 --formatter kafka.tools.LoggingMessageFormatter 2>> /mnt/console_consumer/console_consumer.stderr | tee -a /mnt/console_consumer/console_consumer.stdout &
[DEBUG - 2017-07-18 14:25:44,823 - console_consumer - _worker - lineno:260]: collecting following jmx objects: ['kafka.consumer:type=consumer-coordinator-metrics,client-id=console-consumer']
[DEBUG - 2017-07-18 14:25:44,823 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: nc -z 127.0.0.1 9192
[DEBUG - 2017-07-18 14:25:44,929 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: nc -z 127.0.0.1 9192
[DEBUG - 2017-07-18 14:25:45,035 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: nc -z 127.0.0.1 9192
[DEBUG - 2017-07-18 14:25:45,141 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: nc -z 127.0.0.1 9192
[DEBUG - 2017-07-18 14:25:45,148 - jmx - start_jmx_tool - lineno:71]: ubuntu@worker7: Start JmxTool 1 command: /opt/kafka-dev/bin/kafka-run-class.sh kafka.tools.JmxTool --reporting-interval 1000 --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:9192/jmxrmi --object-name kafka.consumer:type=consumer-coordinator-metrics,client-id=console-consumer --attributes assigned-partitions 1>> /mnt/jmx_tool.log 2>> /mnt/jmx_tool.err.log &
[DEBUG - 2017-07-18 14:25:45,148 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: /opt/kafka-dev/bin/kafka-run-class.sh kafka.tools.JmxTool --reporting-interval 1000 --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:9192/jmxrmi --object-name kafka.consumer:type=consumer-coordinator-metrics,client-id=console-consumer --attributes assigned-partitions 1>> /mnt/jmx_tool.log 2>> /mnt/jmx_tool.err.log &
[DEBUG - 2017-07-18 14:25:45,210 - remoteaccount - _log - lineno:158]: ubuntu@worker10: Running ssh command: mkdir -p /mnt/security
[DEBUG - 2017-07-18 14:25:45,215 - remoteaccount - _log - lineno:158]: ubuntu@worker10: Running ssh command: java -version
[DEBUG - 2017-07-18 14:25:45,216 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[INFO  - 2017-07-18 14:25:45,356 - verifiable_producer - _worker - lineno:131]: verifiable_producer.properties:
[INFO  - 2017-07-18 14:25:45,356 - verifiable_producer - _worker - lineno:132]:
ssl.endpoint.identification.algorithm=HTTPS
sasl.kerberos.service.name=kafka
security.protocol=SASL_SSL
ssl.keystore.location=/mnt/security/test.keystore.jks
ssl.truststore.location=/mnt/security/test.truststore.jks
ssl.keystore.password=test-ks-passwd
sasl.mechanism=GSSAPI
ssl.truststore.password=test-ts-passwd
ssl.key.password=test-key-passwd
sasl.mechanism.inter.broker.protocol=GSSAPI

request.timeout.ms=30000

[DEBUG - 2017-07-18 14:25:45,359 - kafka_path - path - lineno:92]: Using path resolver KafkaSystemTestPathResolver
[INFO  - 2017-07-18 14:25:45,359 - kafka - bootstrap_servers - lineno:549]: Bootstrap client port is: 9095
[DEBUG - 2017-07-18 14:25:45,359 - verifiable_producer - _worker - lineno:136]: VerifiableProducer 1 command: export LOG_DIR=/mnt/verifiable_producer/logs; export KAFKA_OPTS="-Djava.security.auth.login.config=/mnt/security/jaas.conf -Djava.security.krb5.conf=/mnt/security/krb5.conf"; export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/verifiable_producer/tools-log4j.properties"; /opt/kafka-dev/bin/kafka-run-class.sh org.apache.kafka.tools.VerifiableProducer --topic test_topic --broker-list worker11:9095,worker8:9095,worker4:9095 --throughput 100 --producer.config /mnt/verifiable_producer/verifiable_producer.properties 2>> /mnt/verifiable_producer/verifiable_producer.stdout | tee -a /mnt/verifiable_producer/verifiable_producer.stdout &
[DEBUG - 2017-07-18 14:25:45,359 - remoteaccount - _log - lineno:158]: ubuntu@worker10: Running ssh command: export LOG_DIR=/mnt/verifiable_producer/logs; export KAFKA_OPTS="-Djava.security.auth.login.config=/mnt/security/jaas.conf -Djava.security.krb5.conf=/mnt/security/krb5.conf"; export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/verifiable_producer/tools-log4j.properties"; /opt/kafka-dev/bin/kafka-run-class.sh org.apache.kafka.tools.VerifiableProducer --topic test_topic --broker-list worker11:9095,worker8:9095,worker4:9095 --throughput 100 --producer.config /mnt/verifiable_producer/verifiable_producer.properties 2>> /mnt/verifiable_producer/verifiable_producer.stdout | tee -a /mnt/verifiable_producer/verifiable_producer.stdout &
[DEBUG - 2017-07-18 14:25:45,761 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:46,267 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:46,773 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:47,279 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:47,785 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:48,291 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:48,797 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:49,303 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:49,808 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:50,314 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:50,821 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:51,327 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:51,833 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:52,339 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:52,844 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:53,350 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:53,856 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:54,362 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[DEBUG - 2017-07-18 14:25:54,868 - remoteaccount - _log - lineno:158]: ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[INFO  - 2017-07-18 14:25:55,374 - background_thread - _protected_worker - lineno:38]: BackgroundThreadService threw exception:
[INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - lineno:39]: Traceback (most recent call last):
  File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py", line 35, in _protected_worker
    self._worker(idx, node)
  File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py", line 261, in _worker
    self.start_jmx_tool(idx, node)
  File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py", line 73, in start_jmx_tool
    wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
  File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", line 36, in wait_until
    raise TimeoutError(err_msg)
TimeoutError: ubuntu@worker7: Jmx tool took too long to start
{quote}

You can see the calls to nc initially, then checking for the availability of the JMX port. There's some other output unrelated because the jmx stuff is happening in a background thread. Then you can see it start checking for the log file, but it never seems to show up.

In this case there are logs for jmx (whereas I've seen some other tests where there aren't):

{quote}
$ cat jmx_tool.err.log
Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://127.0.0.1:9192/jmxrmi.
{quote}

{quote}
$ cat jmx_tool.log
{quote}

So no output was generated, but it did at least log itself trying to connect.

And unlike some cases I've seen before where the timeouts were a problem, the range of timestamps in the consumer log seems to overlap plenty with the tests for the log file:

{quote}
[2017-07-18 14:25:45,514] INFO ConsumerConfig values:
        auto.commit.interval.ms = 5000
...
[2017-07-18 14:26:46,963] TRACE Completed receive from node 2147483645 for LEAVE_GROUP with correlation id 5854, received {throttle_time_ms=0,error_code=0} (org.apache.kafka.clients.NetworkClient)
[2017-07-18 14:26:46,963] DEBUG LeaveGroup request for group group returned successfully (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
{quote}

Unfortunately, we can't tell about JmxTool itself because it doesn't log any timestamp information.

> System test failure due to timeout starting Jmx tool
> ----------------------------------------------------
>
>                 Key: KAFKA-5608
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5608
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Ewen Cheslack-Postava
>
> Began seeing this in some failing system tests:
> {code}
> [INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - lineno:39]: Traceback (most recent call last):
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py", line 35, in _protected_worker
>     self._worker(idx, node)
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py", line 261, in _worker
>     self.start_jmx_tool(idx, node)
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py", line 73, in start_jmx_tool
>     wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", line 36, in wait_until
>     raise TimeoutError(err_msg)
> TimeoutError: ubuntu@worker7: Jmx tool took too long to start
> {code}
> This is immediately followed by a consumer timeout in the failing cases:
> {code}
> [INFO  - 2017-07-18 14:26:46,907 - runner_client - log - lineno:221]: RunnerClient: kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_SSL.client_protocol=SASL_SSL: FAIL: Consumer failed to consume messages for 60s.
> Traceback (most recent call last):
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 123, in run
>     data = self.run_test()
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", line 176, in run_test
>     return self.test_context.function(self.test)
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", line 321, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py", line 148, in test_rolling_upgrade_phase_two
>     self.run_produce_consume_validate(self.roll_in_secured_settings, client_protocol, broker_protocol)
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 106, in run_produce_consume_validate
>     self.start_producer_and_consumer()
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 79, in start_producer_and_consumer
>     self.consumer_start_timeout_sec)
>   File "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", line 36, in wait_until
>     raise TimeoutError(err_msg)
> TimeoutError: Consumer failed to consume messages for 60s.
> {code}
> There does not appear to be anything wrong with the consumer in the logs, so the timeout seems to be caused by the Jmx tool timeout.
> Possibly due to https://github.com/apache/kafka/pull/3447/files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)