You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by James McCoy <ja...@jamessan.com> on 2021/11/22 00:35:24 UTC

Re: JavaHL test failure and warning in 1.14.1

On Wed, Jul 28, 2021 at 11:55:56PM +0300, Alexandr Miloslavskiy wrote:
> I have tested on Ubuntu 20.04 (on x86-64 arch):
> 
> * Ran entire JavaHL test package 130 times
>   (using a loop in shell script).
>   Not a single error; tests succeed every single time.
> 
> * Ran just the reported test 1000 times; again no errors.
>   The test is 'org.apache.subversion.javahl.BasicTests.testCrash_RequestChannel_nativeRead_AfterException'
> 
> Is the problem reproducible on x86-64 ?
> Is it reproducible by running tests manually ?

Yes, I just hit it as composing this email, although I hadn't in an
earlier 100x loop of the JavaHL suite.

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB

Re: JavaHL test failure and warning in 1.14.1

Posted by James McCoy <ja...@jamessan.com>.
On Wed, Nov 24, 2021 at 03:10:21PM +0300, Alexandr Miloslavskiy wrote:
> On 24.11.2021 15:07, James McCoy wrote:
> > The comment says IOException, but this is InterruptedException.  Is that
> > intentional?
> 
> 
> Catching InterruptedException is simply to be able to call
> 'tunnelAgent.join()'. The comment is correct. The difference between
> 'tunnelAgent.join()' and 'tunnelAgent.joinAndTest()' is testing for
> IOException.

Thanks!  This patch has worked for me, as well.

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB

Re: JavaHL test failure and warning in 1.14.1

Posted by Alexandr Miloslavskiy <al...@syntevo.com>.
On 24.11.2021 15:07, James McCoy wrote:
> The comment says IOException, but this is InterruptedException.  Is that
> intentional?


Catching InterruptedException is simply to be able to call 
'tunnelAgent.join()'. The comment is correct. The difference between 
'tunnelAgent.join()' and 'tunnelAgent.joinAndTest()' is testing for 
IOException.

Re: JavaHL test failure and warning in 1.14.1

Posted by James McCoy <ja...@jamessan.com>.
On Wed, Nov 24, 2021 at 02:13:20AM +0300, Alexandr Miloslavskiy wrote:
> Indeed there was a race condition where TunnelAgent could begin writing at
> the same time when pipe is being closed. This resulted in an unexpected
> IOException, which was detected by the test.

> Please find the patch attached.

> Index: subversion/bindings/javahl/tests/org/apache/subversion/javahl/BasicTests.java
> ===================================================================
> --- subversion/bindings/javahl/tests/org/apache/subversion/javahl/BasicTests.java	(revision 1895276)
> +++ subversion/bindings/javahl/tests/org/apache/subversion/javahl/BasicTests.java	(working copy)
> @@ -4676,7 +4676,19 @@
>              // RuntimeException("Test exception") is expected here
>          }
>  
> -        tunnelAgent.joinAndTest();
> +        // In this test, there is a race condition that sometimes results in
> +        // IOException when 'WAIT_TUNNEL' tries to read from a pipe that
> +        // already has its read end closed. This is not an error, but
> +        // it's hard to distinguish this case from other IOException which
> +        // indicate a problem. To reproduce, simply wrap this test's body in
> +        // a loop. The workaround is to ignore any detected IOException.
> +        //
> +        // tunnelAgent.joinAndTest();
> +        try {
> +            tunnelAgent.join();
> +        } catch (InterruptedException e) {

The comment says IOException, but this is InterruptedException.  Is that
intentional?

> +            e.printStackTrace ();
> +        }
>      }
>  
>      /**


-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB

Re: JavaHL test failure and warning in 1.14.1

Posted by Alexandr Miloslavskiy <al...@syntevo.com>.
Indeed there was a race condition where TunnelAgent could begin writing 
at the same time when pipe is being closed. This resulted in an 
unexpected IOException, which was detected by the test.

This is purely a test issue and should not be a problem for real 
applications.

Unfortunately it's hard to identify this in the test, so I decided to 
weaken the test to get rid of failures.

To reproduce the failure reliably, wrap body of 
'testCrash_RequestChannel_nativeRead_AfterException' in an infinite loop.

Please find the patch attached.

Re: JavaHL test failure and warning in 1.14.1

Posted by Alexandr Miloslavskiy <al...@syntevo.com>.
Thanks!

I tinkered with it a bit to be able to run multiple loops in parallel 
(it requires changing '-Dtest.rootdir' to unique absolute dir in each 
parallel run).

When running 8 loops at once, with just the guilty test, I can reproduce 
the problem within ~10 secs.

Going to investigate tomorrow.

Re: JavaHL test failure and warning in 1.14.1

Posted by James McCoy <ja...@jamessan.com>.
On Tue, Nov 23, 2021 at 12:54:49AM +0300, Alexandr Miloslavskiy wrote:
> On 22.11.2021 3:35, James McCoy wrote:
> > Yes, I just hit it as composing this email, although I hadn't in an
> > earlier 100x loop of the JavaHL suite.
> 
> Could you please give more information? :)

I setup as you described in the previous email and ran the whole suite
in a loop.

$ (set -e; for i in $(seq 200); do echo iteration $i; java -Xcheck:jni ...; done)

> Did you reproduce on x86-64?

Yes.

> Were you running the entire suite?

Yes.

> Did you run it from a loop?

Yes.

> Any ideas the would help me to reproduce?

Not off-hand. :(  Maybe it's a race condition and increasing load on
your system while the tests are running will help?

Is there anything I can _add_ to my test runs to provide useful debug
information?

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB

Re: JavaHL test failure and warning in 1.14.1

Posted by Alexandr Miloslavskiy <al...@syntevo.com>.
On 22.11.2021 3:35, James McCoy wrote:
> Yes, I just hit it as composing this email, although I hadn't in an
> earlier 100x loop of the JavaHL suite.

Could you please give more information? :)

Did you reproduce on x86-64?
Were you running the entire suite?
Did you run it from a loop?
Any ideas the would help me to reproduce?