You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Stefan Seelmann <ma...@stefan-seelmann.de> on 2018/12/25 09:54:42 UTC

Changes in replication code and test

Hi all,

the replication tests were quite unstable: random failures and sometimes
they just hang forever.

While debugging I made some changes:
* In many tests we set the timeout for the LDAP API. Often it was set to
0 or -1 or Long.MAX_VALUE which means infinite wait time. In case of a
bug that means the test hangs forever. I removed all those, the default
timeout of 30 seconds applies now. Better fail than hang forever.
* I also removed the infinite timeout in ReplicationConsumerImpl [1],
please let me know if there is a reason to keep that.

The reason for the failures was a race condition in the LDAP API [2].

Kind Regards,
Stefan


[1]
https://github.com/apache/directory-server/commit/63001815bc135767851549022397aa5a9ba4fdda
[X]
https://github.com/apache/directory-ldap-api/commit/1118e2dcd07fb6b342d37a090850dcc0874b4cf3


Re: Changes in replication code and test

Posted by Emmanuel Lécharny <el...@gmail.com>.
Hi Stefan,

On 25/12/2018 10:54, Stefan Seelmann wrote:
> Hi all,
>
> the replication tests were quite unstable: random failures and sometimes
> they just hang forever.
>
> While debugging I made some changes:
> * In many tests we set the timeout for the LDAP API. Often it was set to
> 0 or -1 or Long.MAX_VALUE which means infinite wait time. In case of a
> bug that means the test hangs forever. I removed all those, the default
> timeout of 30 seconds applies now. Better fail than hang forever.

Yes, ity does not make any sense to wait forever. If teh connection is 
down, it's up to the caller to deal with that.


> * I also removed the infinite timeout in ReplicationConsumerImpl [1],
> please let me know if there is a reason to keep that.


I'm pretty sure that was added for debugging purposes (it's hard to 
debug when you just have 30 s to step through the code). Removing this 
should have been done a long time ago...

>
> The reason for the failures was a race condition in the LDAP API [2].


Good catch !


Many thanks Stefan !