You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Barrett Oglesby (Jira)" <ji...@apache.org> on 2019/10/30 21:30:00 UTC

[jira] [Commented] (GEODE-7387) CI failure: SingleHopGetAllPutAllDUnitTest > testRemoveAllInClient failed with AssertionError

    [ https://issues.apache.org/jira/browse/GEODE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963447#comment-16963447 ] 

Barrett Oglesby commented on GEODE-7387:
----------------------------------------

I added some logging and ran the test locally. It passes fine, but I see how it could fail.

As soon as the putAll starts, metadata retrieval is scheduled:
{noformat}
[vm3] [warn 2019/10/30 12:14:42.009 PDT <RMI TCP Connection(1)-127.0.0.1> tid=0x12 ClientMetadataService.scheduleGetPRMetaData region=/TestPartitionedRegion
[vm3] java.lang.Exception
[vm3] 	at org.apache.geode.cache.client.internal.ClientMetadataService.scheduleGetPRMetaData(ClientMetadataService.java:498)
[vm3] 	at org.apache.geode.cache.client.internal.ClientMetadataService.getServerToFilterMap(ClientMetadataService.java:209)
[vm3] 	at org.apache.geode.cache.client.internal.ClientMetadataService.getServerToFilterMap(ClientMetadataService.java:201)
[vm3] 	at org.apache.geode.cache.client.internal.PutAllOp.execute(PutAllOp.java:94)
[vm3] 	at org.apache.geode.cache.client.internal.ServerRegionProxy.putAll(ServerRegionProxy.java:592)
[vm3] 	at org.apache.geode.internal.cache.LocalRegion.basicPutAll(LocalRegion.java:8913)
[vm3] 	at org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8846)
[vm3] 	at org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8858)
[vm3] 	at org.apache.geode.internal.cache.execute.SingleHopGetAllPutAllDUnitTest.lambda$testRemoveAllInClient$bb17a952$1(SingleHopGetAllPutAllDUnitTest.java:160)
{noformat}
Metadata retrieval is spun off to a Function Execution Thread for processing. Once metadata retrieval starts, the region's clientMetaDataLock is taken, and the ClientPartitionAdvisor is added to the clientPRAdvisors:
{noformat}
[vm3] [warn 2019/10/30 12:14:42.011 PDT <Function Execution Thread-1> tid=0x35 ClientMetadataService.getClientPRMetadata got lock region=/TestPartitionedRegion

[vm3] [warn 2019/10/30 12:14:42.013 PDT <Function Execution Thread-1> tid=0x35 ClientMetadataService.addClientPartitionAdvisor regionFullPath=/TestPartitionedRegion
[vm3] java.lang.Exception
[vm3] 	at org.apache.geode.cache.client.internal.ClientMetadataService.addClientPartitionAdvisor(ClientMetadataService.java:767)
[vm3] 	at org.apache.geode.cache.client.internal.ClientMetadataService.getClientPRMetadata(ClientMetadataService.java:563)
[vm3] 	at org.apache.geode.cache.client.internal.ClientMetadataService$1.run(ClientMetadataService.java:526)
[vm3] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[vm3] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[vm3] 	at java.lang.Thread.run(Thread.java:748)
{noformat}
So, now the metadata is being retrieved, and the clientPRAdvisors is not null.

Then, the putAll happens, and verifyMetadata is called.

That method calls getClientPRMetadata:
{noformat}
cms.getClientPRMetadata((LocalRegion) region);
{noformat}
If client metadata is still being retrieved by a Function Execution thread, this call returns immediately because that thread has the lock:
{noformat}
[vm3] [warn 2019/10/30 12:14:42.435 PDT <RMI TCP Connection(1)-127.0.0.1> tid=0x12 ClientMetadataService.getClientPRMetadata did not get lock region=/TestPartitionedRegion
{noformat}
If that happens, the test will fail with the AssertionError.

Thats because verifyMetadata waits for non-empty clientPRAdvisors, and it won't be empty in this case:
{noformat}
final Map<String, ClientPartitionAdvisor> regionMetaData = cms.getClientPRMetadata_TEST_ONLY();
await().until(() -> regionMetaData.size() > 0);
{noformat}
btw - I think that code is wrong. It gets the clientPRAdvisors once outside the await call. If its empty from the start, its always going to be empty.

In any event, once the code gets past this check, it gets into the assert that fails:
{noformat}
await().until(() -> {
  ClientPartitionAdvisor prMetaData = regionMetaData.get(region.getFullPath());
  assertThat(prMetaData).isNotNull();
  assertThat(prMetaData.adviseRandomServerLocation()).isNotNull();
  return true;
});
{noformat}
Another btw - I'm not sure that await is doing anything. The assert fails immediately and throws the exception. It doesn't wait 5 minutes.

At least one change that might be better is for the verifyMetadata method to check isMetadataStable rather than getClientPRMetadata_TEST_ONLY.

> CI failure: SingleHopGetAllPutAllDUnitTest > testRemoveAllInClient failed with AssertionError
> ---------------------------------------------------------------------------------------------
>
>                 Key: GEODE-7387
>                 URL: https://issues.apache.org/jira/browse/GEODE-7387
>             Project: Geode
>          Issue Type: Bug
>          Components: tests
>            Reporter: Barrett Oglesby
>            Priority: Major
>
> DistributedTestOpenJDK8 build 1252:
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/1252
> The client starts the testRemoveAllInClient task:
> {noformat}
> [vm3] [info 2019/10/30 12:17:46.836 GMT <RMI TCP Connection(1)-172.17.0.4> tid=0x20] Received method: org.apache.geode.test.dunit.NamedRunnable.run with 0 args on object: runnable(testRemoveAllInClient)
> {noformat}
> It does a putAll (which creates the buckets on the servers):
> {noformat}
> [vm1] [info 2019/10/30 12:17:46.959 GMT <ServerConnection on port 25248 Thread 2> tid=0x81] Initializing region _B__TestPartitionedRegion_6
> [vm2] [info 2019/10/30 12:17:47.030 GMT <Pooled Waiting Message Processor 1> tid=0x5a] Initializing region _B__TestPartitionedRegion_6
> [vm0] [info 2019/10/30 12:17:47.083 GMT <Pooled Waiting Message Processor 1> tid=0x53] Initializing region _B__TestPartitionedRegion_6
> [vm0] [info 2019/10/30 12:17:47.150 GMT <Pooled Waiting Message Processor 2> tid=0x89] Initializing region _B__TestPartitionedRegion_4
> [vm2] [info 2019/10/30 12:17:47.178 GMT <Pooled Waiting Message Processor 1> tid=0x5a] Initializing region _B__TestPartitionedRegion_4
> ...
> {noformat}
> The putAll timeouts out once (read-timeout is set to 2000 ms):
> {noformat}
> [vm3] [warn 2019/10/30 12:17:48.942 GMT <RMI TCP Connection(1)-172.17.0.4> tid=0x20] Pool unexpected socket timed out on client connection=Pooled Connection to 23c02ffd4a40:25248: Connection[23c02ffd4a40:25248]@264362003)
> {noformat}
> It must succeed after retry (retry-attempts is set to 2) because the verifyMetadata method is executed, and the assertion fails:
> {noformat}
> [vm3] [info 2019/10/30 12:17:49.429 GMT <RMI TCP Connection(1)-172.17.0.4> tid=0x20] Got result: EXCEPTION_OCCURRED
> [vm3] java.lang.AssertionError: 
> [vm3] Expecting actual not to be null
> [vm3] 	at org.apache.geode.internal.cache.execute.SingleHopGetAllPutAllDUnitTest.lambda$verifyMetadata$1(SingleHopGetAllPutAllDUnitTest.java:247)
> [vm3] 	at org.awaitility.core.CallableCondition$ConditionEvaluationWrapper.eval(CallableCondition.java:100)
> [vm3] 	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:192)
> [vm3] 	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:179)
> [vm3] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [vm3] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [vm3] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [vm3] 	at java.lang.Thread.run(Thread.java:748)
> [vm3]  from org.apache.geode.test.dunit.NamedRunnable.run with 0 args on object: runnable(testRemoveAllInClient) (took 2592 ms)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)