You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ilya Shishkov (Jira)" <ji...@apache.org> on 2021/04/29 15:36:00 UTC
[jira] [Updated] (IGNITE-14671) Test
IgniteClusterSnapshotCheckTest#testClusterSnapshotCheckOtherCluster is
flaky
[ https://issues.apache.org/jira/browse/IGNITE-14671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilya Shishkov updated IGNITE-14671:
-----------------------------------
Description:
To reproduce failure, run it several times, for example, set up IDE to run test with 'repeat until failure' option. Then you will get an assertion error:
{code:java}
java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [a2844419-3081-432a-b611-c4f891900005]
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
{code}
With applied patch [1] exception would be as follows:
{code:java}
java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
{code}
It seems to be a concurrent update problem of thread unsafe HasSet:
{code:java|title=Unsafe HashSet}
Set<UUID> assigns = new HashSet<>();
{code}
{code:java|title=Concurrent update}
grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new GridMessageListener() {
@Override public void onMessage(UUID nodeId, Object msg, byte plc) {
if (msg instanceof GridJobExecuteRequest) {
GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg;
if (msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName()))
assigns.add(locNodeId);
}
}
});
{code}
With concurrent Set implementation problem is not reproducing (see patch [2]):
{code:java}
Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>());
{code}
# [^testClusterSnapshotCheckOtherCluster_printCount.patch]
was:
To reproduce failure, run it several times, for example, set up IDE to run test with 'repeat until failure' option. Then you will get an assertion error:
{code:java}
java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [a2844419-3081-432a-b611-c4f891900005]
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
{code}
With applied patch [1] exception would be as follows:
{code:java}
java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
{code}
It seems to be a concurrent update problem of thread unsafe HasSet:
{code:java|title=Unsafe HashSet}
Set<UUID> assigns = new HashSet<>();
{code}
{code:java|title=Concurrent update}
grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new GridMessageListener() {
@Override public void onMessage(UUID nodeId, Object msg, byte plc) {
if (msg instanceof GridJobExecuteRequest) {
GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg;
if (msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName()))
assigns.add(locNodeId);
}
}
});
{code}
With concurrent Set implementation problem is not reproducing (see patch [2]):
{code:java}
Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>());
{code}
> Test IgniteClusterSnapshotCheckTest#testClusterSnapshotCheckOtherCluster is flaky
> ---------------------------------------------------------------------------------
>
> Key: IGNITE-14671
> URL: https://issues.apache.org/jira/browse/IGNITE-14671
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.11
> Reporter: Ilya Shishkov
> Assignee: Ilya Shishkov
> Priority: Trivial
> Labels: iep-43
> Attachments: testClusterSnapshotCheckOtherCluster_printCount.patch
>
>
> To reproduce failure, run it several times, for example, set up IDE to run test with 'repeat until failure' option. Then you will get an assertion error:
> {code:java}
> java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [a2844419-3081-432a-b611-c4f891900005]
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
> at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
> {code}
> With applied patch [1] exception would be as follows:
> {code:java}
> java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
> at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
> {code}
> It seems to be a concurrent update problem of thread unsafe HasSet:
> {code:java|title=Unsafe HashSet}
> Set<UUID> assigns = new HashSet<>();
> {code}
> {code:java|title=Concurrent update}
> grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new GridMessageListener() {
> @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
> if (msg instanceof GridJobExecuteRequest) {
> GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg;
> if (msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName()))
> assigns.add(locNodeId);
> }
> }
> });
> {code}
> With concurrent Set implementation problem is not reproducing (see patch [2]):
> {code:java}
> Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>());
> {code}
> # [^testClusterSnapshotCheckOtherCluster_printCount.patch]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)