You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Andres de la Peña (Jira)" <ji...@apache.org> on 2022/09/01 10:58:00 UTC
[jira] [Updated] (CASSANDRA-17872) Dtests failing intermittently on Jolokia agent
[ https://issues.apache.org/jira/browse/CASSANDRA-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andres de la Peña updated CASSANDRA-17872:
------------------------------------------
Description:
Some apparently unrealeted Python dtests fail with an output of the form:
{code:java}
Error Message
subprocess.CalledProcessError: Command '('/usr/lib/jvm/java-8-openjdk-amd64/bin/java', '-cp', '/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/cassandra/cassandra/cassandra-dtest/tools/../lib/jolokia-jvm-1.7.1-agent.jar', 'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1', 'start', '706')' returned non-zero exit status 1.
Stacktrace
self = <auth_test.TestAuthRoles object at 0x7fc6cb4313a0>
(...)
mbean = make_mbean('auth', type='RolesCache')
> with JolokiaAgent(self.cluster.nodelist()[0]) as jmx:
auth_test.py:1888:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tools/jmxutils.py:309: in __enter__
self.start()
tools/jmxutils.py:187: in start
subprocess.check_output(args, stderr=subprocess.STDOUT)
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
input = None, capture_output = False, timeout = None, check = True
popenargs = (('/usr/lib/jvm/java-8-openjdk-amd64/bin/java', '-cp', '/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/cassandr...t/tools/../lib/jolokia-jvm-1.7.1-agent.jar', 'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1', ...),)
kwargs = {'stderr': -2, 'stdout': -1}
process = <subprocess.Popen object at 0x7fc6c9afb910>
stdout = b"Couldn't start agent for PID 706\nPossible reason could be that port '8778' is already occupied.\nPlease check the standard output of the target process for a detailed error message.\n"
stderr = None, retcode = 1
(...)
if check and retcode:
> raise CalledProcessError(retcode, process.args,
output=stdout, stderr=stderr)
E subprocess.CalledProcessError: Command '('/usr/lib/jvm/java-8-openjdk-amd64/bin/java', '-cp', '/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/cassandra/cassandra/cassandra-dtest/tools/../lib/jolokia-jvm-1.7.1-agent.jar', 'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1', 'start', '706')' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError
{code}
Here are a bunch hits for multiple branches:
* [https://app.circleci.com/pipelines/github/adelapena/cassandra/2035/workflows/1e06bd6d-8bd6-4703-85db-2b41e964134e/jobs/20403]
* [https://ci-cassandra.apache.org/job/Cassandra-3.11/387/testReport/dtest-novnode.thrift_hsha_test/TestThriftHSHA/test_closing_connections/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.0/454/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairLegacyStreaming/test_transient_incremental_repair/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.0/461/testReport/dtest-novnode.read_repair_test/TestSpeculativeReadRepair/test_failed_read_repair/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.0/461/testReport/dtest-novnode.transient_replication_test/TestTransientReplication/test_cheap_quorums/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.0/464/testReport/dtest-offheap.repair_tests.incremental_repair_test/TestIncRepair/test_parent_repair_session_cleanup/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.0/465/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairLegacyStreaming/test_transient_incremental_repair/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.0/465/testReport/dtest-offheap.repair_tests.incremental_repair_test/TestIncRepair/test_repaired_tracking_with_partition_deletes/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.1/135/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairStreamEntireSSTable/test_primary_range_repair/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.1/135/testReport/dtest.auth_test/TestNetworkAuth/test_revoked_login/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.1/145/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairLegacyStreaming/test_primary_range_repair/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.1/148/testReport/dtest-novnode.auth_test/TestAuthRoles/test_role_caching_authenticated_user/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.1/151/testReport/dtest-novnode.read_repair_test/TestSpeculativeReadRepair/test_speculative_data_request/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.1/151/testReport/dtest.read_repair_test/TestSpeculativeReadRepair/test_quorum_requirement_on_speculated_read/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1288/testReport/dtest.jmx_test/TestJMX/test_mv_metric_mbeans_release/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1295/testReport/dtest-novnode.client_request_metrics_local_remote_test/TestClientRequestMetricsLocalRemote/test_paxos/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1295/testReport/dtest-offheap.read_repair_test/TestSpeculativeReadRepair/test_quorum_requirement/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1296/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairStreamEntireSSTable/test_speculative_write_repair_cycle/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1296/testReport/dtest-offheap.configuration_test/TestConfiguration/test_change_durable_writes/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1300/testReport/dtest-novnode.read_repair_test/TestSpeculativeReadRepair/test_failed_read_repair/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1300/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairStreamEntireSSTable/test_optimized_primary_range_repair/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1301/testReport/dtest-novnode.client_request_metrics_local_remote_test/TestClientRequestMetricsLocalRemote/test_batch_and_slice/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1301/testReport/dtest-novnode.client_request_metrics_local_remote_test/TestClientRequestMetricsLocalRemote/test_write_and_read/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1302/testReport/dtest-upgrade.upgrade_tests.regression_test/TestForRegressionsUpgrade_current_3_11_x_To_indev_trunk/test13294/]
Note the common {{with JolokiaAgent(self.cluster.nodelist()[0])}} and {{"Couldn't start agent for PID 1224\nPossible reason could be that port '8778' is already occupied.\nPlease check the standard output of the target process for a detailed error message.\n"}} parts.
So far, the issue doesn't seem to reproduce on 3.0.
was:
Some apparently unrealeted Python dtests fail with an output of the form:
{code:java}
Error Message
subprocess.CalledProcessError: Command '('/usr/lib/jvm/java-8-openjdk-amd64/bin/java', '-cp', '/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/cassandra/cassandra/cassandra-dtest/tools/../lib/jolokia-jvm-1.7.1-agent.jar', 'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1', 'start', '706')' returned non-zero exit status 1.
Stacktrace
self = <auth_test.TestAuthRoles object at 0x7fc6cb4313a0>
def test_role_caching_authenticated_user(self):
"""
This test is to show that the role caching in AuthenticatedUser
works correctly and revokes the roles from a logged in user
* Launch a one node cluster, with a roles cache of 2s
* Connect as the default superuser
* Create ROLES role1 and mike
* Grant permissions to role1, and role1 to mike
* Verify mike can perform expected operations
* Revoke role1, and thus read permissions, from mike.
* Try reading as mike, and verify that eventually the cache expires and it fails.
"""
# on older versions the cache is not initialized until used,
# we need the MBean registered so let's use it
if self.dtest_config.cassandra_version_from_build < '4.0':
self.superuser.execute("LIST ROLES")
mbean = make_mbean('auth', type='RolesCache')
> with JolokiaAgent(self.cluster.nodelist()[0]) as jmx:
auth_test.py:1888:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tools/jmxutils.py:309: in __enter__
self.start()
tools/jmxutils.py:187: in start
subprocess.check_output(args, stderr=subprocess.STDOUT)
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
input = None, capture_output = False, timeout = None, check = True
popenargs = (('/usr/lib/jvm/java-8-openjdk-amd64/bin/java', '-cp', '/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/cassandr...t/tools/../lib/jolokia-jvm-1.7.1-agent.jar', 'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1', ...),)
kwargs = {'stderr': -2, 'stdout': -1}
process = <subprocess.Popen object at 0x7fc6c9afb910>
stdout = b"Couldn't start agent for PID 706\nPossible reason could be that port '8778' is already occupied.\nPlease check the standard output of the target process for a detailed error message.\n"
stderr = None, retcode = 1
def run(*popenargs,
input=None, capture_output=False, timeout=None, check=False, **kwargs):
"""Run command with arguments and return a CompletedProcess instance.
The returned instance will have attributes args, returncode, stdout and
stderr. By default, stdout and stderr are not captured, and those attributes
will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
If check is True and the exit code was non-zero, it raises a
CalledProcessError. The CalledProcessError object will have the return code
in the returncode attribute, and output & stderr attributes if those streams
were captured.
If timeout is given, and the process takes too long, a TimeoutExpired
exception will be raised.
There is an optional argument "input", allowing you to
pass bytes or a string to the subprocess's stdin. If you use this argument
you may not also use the Popen constructor's "stdin" argument, as
it will be used internally.
By default, all communication is in bytes, and therefore any "input" should
be bytes, and the stdout and stderr will be bytes. If in text mode, any
"input" should be a string, and stdout and stderr will be strings decoded
according to locale encoding, or by "encoding" if set. Text mode is
triggered by setting any of text, encoding, errors or universal_newlines.
The other arguments are the same as for the Popen constructor.
"""
if input is not None:
if kwargs.get('stdin') is not None:
raise ValueError('stdin and input arguments may not both be used.')
kwargs['stdin'] = PIPE
if capture_output:
if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
raise ValueError('stdout and stderr arguments may not be used '
'with capture_output.')
kwargs['stdout'] = PIPE
kwargs['stderr'] = PIPE
with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
except TimeoutExpired as exc:
process.kill()
if _mswindows:
# Windows accumulates the output in a single blocking
# read() call run on child threads, with the timeout
# being done in a join() on those threads. communicate()
# _after_ kill() is required to collect that and add it
# to the exception.
exc.stdout, exc.stderr = process.communicate()
else:
# POSIX _communicate already populated the output so
# far into the TimeoutExpired exception.
process.wait()
raise
except: # Including KeyboardInterrupt, communicate handled that.
process.kill()
# We don't call process.wait() as .__exit__ does that for us.
raise
retcode = process.poll()
if check and retcode:
> raise CalledProcessError(retcode, process.args,
output=stdout, stderr=stderr)
E subprocess.CalledProcessError: Command '('/usr/lib/jvm/java-8-openjdk-amd64/bin/java', '-cp', '/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/cassandra/cassandra/cassandra-dtest/tools/../lib/jolokia-jvm-1.7.1-agent.jar', 'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1', 'start', '706')' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError
{code}
Here are a bunch hits for multiple branches:
* [https://app.circleci.com/pipelines/github/adelapena/cassandra/2035/workflows/1e06bd6d-8bd6-4703-85db-2b41e964134e/jobs/20403]
* [https://ci-cassandra.apache.org/job/Cassandra-3.11/387/testReport/dtest-novnode.thrift_hsha_test/TestThriftHSHA/test_closing_connections/]
* [https://ci-cassandra.apache.org/job/Cassandra-4.1/148/testReport/dtest-novnode.auth_test/TestAuthRoles/test_role_caching_authenticated_user/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1300/testReport/dtest-novnode.read_repair_test/TestSpeculativeReadRepair/test_failed_read_repair/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1295/testReport/dtest-novnode.client_request_metrics_local_remote_test/TestClientRequestMetricsLocalRemote/test_paxos/]
* [https://ci-cassandra.apache.org/job/Cassandra-trunk/1301/testReport/dtest-novnode.client_request_metrics_local_remote_test/TestClientRequestMetricsLocalRemote/test_batch_and_slice/]
Note the common {{with JolokiaAgent(self.cluster.nodelist()[0])}} and {{"Couldn't start agent for PID 1224\nPossible reason could be that port '8778' is already occupied.\nPlease check the standard output of the target process for a detailed error message.\n"}} parts.
> Dtests failing intermittently on Jolokia agent
> ----------------------------------------------
>
> Key: CASSANDRA-17872
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17872
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/python
> Reporter: Andres de la Peña
> Priority: Normal
>
> Some apparently unrealeted Python dtests fail with an output of the form:
> {code:java}
> Error Message
> subprocess.CalledProcessError: Command '('/usr/lib/jvm/java-8-openjdk-amd64/bin/java', '-cp', '/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/cassandra/cassandra/cassandra-dtest/tools/../lib/jolokia-jvm-1.7.1-agent.jar', 'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1', 'start', '706')' returned non-zero exit status 1.
> Stacktrace
> self = <auth_test.TestAuthRoles object at 0x7fc6cb4313a0>
> (...)
>
> mbean = make_mbean('auth', type='RolesCache')
> > with JolokiaAgent(self.cluster.nodelist()[0]) as jmx:
> auth_test.py:1888:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> tools/jmxutils.py:309: in __enter__
> self.start()
> tools/jmxutils.py:187: in start
> subprocess.check_output(args, stderr=subprocess.STDOUT)
> /usr/lib/python3.8/subprocess.py:415: in check_output
> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> input = None, capture_output = False, timeout = None, check = True
> popenargs = (('/usr/lib/jvm/java-8-openjdk-amd64/bin/java', '-cp', '/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/cassandr...t/tools/../lib/jolokia-jvm-1.7.1-agent.jar', 'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1', ...),)
> kwargs = {'stderr': -2, 'stdout': -1}
> process = <subprocess.Popen object at 0x7fc6c9afb910>
> stdout = b"Couldn't start agent for PID 706\nPossible reason could be that port '8778' is already occupied.\nPlease check the standard output of the target process for a detailed error message.\n"
> stderr = None, retcode = 1
> (...)
> if check and retcode:
> > raise CalledProcessError(retcode, process.args,
> output=stdout, stderr=stderr)
> E subprocess.CalledProcessError: Command '('/usr/lib/jvm/java-8-openjdk-amd64/bin/java', '-cp', '/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/cassandra/cassandra/cassandra-dtest/tools/../lib/jolokia-jvm-1.7.1-agent.jar', 'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1', 'start', '706')' returned non-zero exit status 1.
> /usr/lib/python3.8/subprocess.py:516: CalledProcessError
> {code}
> Here are a bunch hits for multiple branches:
> * [https://app.circleci.com/pipelines/github/adelapena/cassandra/2035/workflows/1e06bd6d-8bd6-4703-85db-2b41e964134e/jobs/20403]
> * [https://ci-cassandra.apache.org/job/Cassandra-3.11/387/testReport/dtest-novnode.thrift_hsha_test/TestThriftHSHA/test_closing_connections/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.0/454/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairLegacyStreaming/test_transient_incremental_repair/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.0/461/testReport/dtest-novnode.read_repair_test/TestSpeculativeReadRepair/test_failed_read_repair/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.0/461/testReport/dtest-novnode.transient_replication_test/TestTransientReplication/test_cheap_quorums/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.0/464/testReport/dtest-offheap.repair_tests.incremental_repair_test/TestIncRepair/test_parent_repair_session_cleanup/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.0/465/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairLegacyStreaming/test_transient_incremental_repair/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.0/465/testReport/dtest-offheap.repair_tests.incremental_repair_test/TestIncRepair/test_repaired_tracking_with_partition_deletes/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.1/135/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairStreamEntireSSTable/test_primary_range_repair/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.1/135/testReport/dtest.auth_test/TestNetworkAuth/test_revoked_login/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.1/145/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairLegacyStreaming/test_primary_range_repair/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.1/148/testReport/dtest-novnode.auth_test/TestAuthRoles/test_role_caching_authenticated_user/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.1/151/testReport/dtest-novnode.read_repair_test/TestSpeculativeReadRepair/test_speculative_data_request/]
> * [https://ci-cassandra.apache.org/job/Cassandra-4.1/151/testReport/dtest.read_repair_test/TestSpeculativeReadRepair/test_quorum_requirement_on_speculated_read/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1288/testReport/dtest.jmx_test/TestJMX/test_mv_metric_mbeans_release/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1295/testReport/dtest-novnode.client_request_metrics_local_remote_test/TestClientRequestMetricsLocalRemote/test_paxos/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1295/testReport/dtest-offheap.read_repair_test/TestSpeculativeReadRepair/test_quorum_requirement/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1296/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairStreamEntireSSTable/test_speculative_write_repair_cycle/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1296/testReport/dtest-offheap.configuration_test/TestConfiguration/test_change_durable_writes/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1300/testReport/dtest-novnode.read_repair_test/TestSpeculativeReadRepair/test_failed_read_repair/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1300/testReport/dtest-novnode.transient_replication_test/TestTransientReplicationRepairStreamEntireSSTable/test_optimized_primary_range_repair/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1301/testReport/dtest-novnode.client_request_metrics_local_remote_test/TestClientRequestMetricsLocalRemote/test_batch_and_slice/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1301/testReport/dtest-novnode.client_request_metrics_local_remote_test/TestClientRequestMetricsLocalRemote/test_write_and_read/]
> * [https://ci-cassandra.apache.org/job/Cassandra-trunk/1302/testReport/dtest-upgrade.upgrade_tests.regression_test/TestForRegressionsUpgrade_current_3_11_x_To_indev_trunk/test13294/]
> Note the common {{with JolokiaAgent(self.cluster.nodelist()[0])}} and {{"Couldn't start agent for PID 1224\nPossible reason could be that port '8778' is already occupied.\nPlease check the standard output of the target process for a detailed error message.\n"}} parts.
> So far, the issue doesn't seem to reproduce on 3.0.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org