You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Randall Hauch (JIRA)" <ji...@apache.org> on 2018/02/21 20:49:00 UTC

[jira] [Updated] (KAFKA-6577) Connect standalone SASL file source and sink test fails without explanation

     [ https://issues.apache.org/jira/browse/KAFKA-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Randall Hauch updated KAFKA-6577:
---------------------------------
    Description: The {{tests/kafkatest/tests/connect/connect_test.py::ConnectStandaloneFileTest.test_file_source_and_sink}} test is failing with the SASL configuration without a sufficient explanation. During the test, the Connect worker fails to start, but the Connect log contains no useful information.  (was: The {{tests/kafkatest/tests/connect/connect_test.py::ConnectStandaloneFileTest.test_file_source_and_sink}} test is failing with the SASL configuration without a sufficient explanation. During the test, the Connect worker fails to start, but the Connect log contains no useful information.

There are actual several things compounding to cause the failure and make it difficult to understand the problem.

First, the {{tests/kafkatest/tests/connect/templates/connect_standalone.properties}} is only adding in the broker's security configuration with the "producer." and "consumer." prefixes, but is not adding them with no prefix. The worker uses the AdminClient to connect to the broker to get the Kafka cluster ID and to manage the three internal topics, and the AdminClient is configured via top-level properties. Because the SASL test requires the clients all connect using SASL, the lack of broker security configs means the AdminClient was attempting and failing to connect to the broker. This is corrected by adding the broker's security configuration to the Connect worker configuration file at the top-level. (This was already being done in the {{connect_distributed.properties}} file.)

Second, the default {{request.timeout.ms}} for the AdminClient (and the other clients) is 120 seconds, so the AdminClient was retrying for 120 seconds before it would give up and thrown an error. However, the test was only waiting for 60 seconds before determining that the service failed to start. This can be corrected by setting {{request.timeout.ms=10000}} in the Connect worker configurations (both distributed and standalone).

Third, the Connect workers were recently changed to lookup the Kafka cluster ID before it started the herder. This is unlike the older uses of the AdminClient to find and manage the internal topics, where failure to connect was not necessarily logged correctly but nevertheless still skipped over, relying upon broker auto-topic creation to create the internal topics. (This may be why the test did not fail prior to the recent change to always require a successful AdminClient connection.) Although the worker never got this far in its startup process, the fact that we missed such an error since the prior releases means that failure to connect with the AdminClient was not being properly reported.)

> Connect standalone SASL file source and sink test fails without explanation
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-6577
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6577
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect, system tests
>    Affects Versions: 1.1.0
>            Reporter: Randall Hauch
>            Assignee: Randall Hauch
>            Priority: Blocker
>             Fix For: 1.1.0
>
>
> The {{tests/kafkatest/tests/connect/connect_test.py::ConnectStandaloneFileTest.test_file_source_and_sink}} test is failing with the SASL configuration without a sufficient explanation. During the test, the Connect worker fails to start, but the Connect log contains no useful information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)