You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by James McCoy <ja...@jamessan.com> on 2020/08/08 14:35:14 UTC

JNI segfault while running Java tests

The Debian builds for 1.14.0 recently started crashing while running the
Java tests.  This is pretty far out of my expertise, so hopefully
someone can help out.

I've attached the hs_err file.

The tests used to pass with OpenJDK 11.0.7+10 and we recently got an
update to 11.0.8+10.  I'm trying to run against the older version to see
if the tests still pass with that config.

Let me know if there's any other information I can provide.

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB

Re: JNI segfault while running Java tests

Posted by James McCoy <ja...@jamessan.com>.
On Sun, Aug 09, 2020 at 11:10:42PM +0200, Daniel Sahlberg wrote:
> I have investigated further and I think I have found the issue. A patch is
> attached, basically changing 
>     const String::Contents key(String(m_env, jkey));
> to 
>     const String str(m_env, jkey);
>     const String::Contents key(str);
> in ImmutableMap.for_each.

Thanks! I can confirm that fixes the crash for me, as well.

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB

Re: JNI segfault while running Java tests

Posted by Alexandr Miloslavskiy <al...@syntevo.com>.
On 09.08.2020 23:10, Daniel Sahlberg wrote:

> I have investigated further and I think I have found the issue. A patch 
> is attached, basically changing
>      const String::Contents key(String(m_env, jkey));
> to
>      const String str(m_env, jkey);
>      const String::Contents key(str);
> in ImmutableMap.for_each.
> 
> If I understand things correctly (admittedly I'm not an expert in C++), 
> the lifetime of the String object is just the execution of the 
> constructor of the Contents class. But the Contents class saves a 
> reference to the String object in a member variable. When the Contents 
> object is destroyed at the end of the function, it references the 
> already previously destroyed String object.

The patch looks good to me.

Here's what happens here:
1) A temporary 'String' is constructed on stack.
2) 'String::Contents' is constructed on stack, remembering address of
    'String' passed as argument.
3) Temporary 'String' goes out of scope
    It doesn't have a destructor, so nothing happens.
4) Stack region occupied by 'String' becomes free to use.
5) If compiler is smart enough, it can now reuse this stack region for
    other variables, changing the memory values.
6) If compiler is lazy, it will not reuse the stack region, so the
    values there will be intact, as if the patch was applied.
7) Depending on (5)(6) and how stack region was used, it may crash.

So it is indeed a mistake, but it often goes unnoticed.

Re: JNI segfault while running Java tests

Posted by Daniel Sahlberg <da...@gmail.com>.
Den sön 9 aug. 2020 kl 15:28 skrev Daniel Sahlberg <
daniel.l.sahlberg@gmail.com>:

> Den sön 9 aug. 2020 15:14Nathan Hartman <ha...@gmail.com> skrev:
>
>> On Sat, Aug 8, 2020 at 1:23 PM James McCoy <ja...@jamessan.com> wrote:
>>
>>> On Sat, Aug 08, 2020 at 10:35:14AM -0400, James McCoy wrote:
>>> > The Debian builds for 1.14.0 recently started crashing while running
>>> the
>>> > Java tests.  This is pretty far out of my expertise, so hopefully
>>> > someone can help out.
>>
>>
>> I don't know if it's related, but a few days ago we received a patch from
>> Alexandr Miloslavskiy to fix a crash which is caused by a garbage collected
>> object not being pinned before use by native code [1]. Perhaps Alexandr
>> found the issue because of a similar crash to the one you're experiencing.
>> Could you try the patch?
>>
>
> I thought about the same. However the patch didn't seem to make a
> difference.
>
> I can confirm James' statement that it crashes when compiled using GCC 10
> but it seems to work with GCC 9. In my case I'm using Fedora 32 versus
> Fedora 30 so I can't rule out that there are other differences but it seems
> reasonable that GCC is doing something strange.
>
> I have some done some preliminary investigations but I'm done yet, it
> seems that the code is using an object that has already been destructed.
>
> Kind regards
> Daniel
>

I have investigated further and I think I have found the issue. A patch is
attached, basically changing
    const String::Contents key(String(m_env, jkey));
to
    const String str(m_env, jkey);
    const String::Contents key(str);
in ImmutableMap.for_each.

If I understand things correctly (admittedly I'm not an expert in C++), the
lifetime of the String object is just the execution of the constructor of
the Contents class. But the Contents class saves a reference to the String
object in a member variable. When the Contents object is destroyed at the
end of the function, it references the already previously destroyed String
object.

This is the same in GCC 9 as well as GCC 10 (also the same in Visual Studio
2019!) so I'm guessing that GCC 10 is better at "cleaing up" destroyed
object to the point where it trigger a segfault (but it's not consistent as
a "minimal example" with GCC 10 show this behaviour but still doesn't
segfault).

When the String object is assigned to it's own variable it lives until the
end of the function and it is destroyed after the Contents object, thus the
destructor of the Contents class succeeds.

With this patch make check-javahl succeeds with GCC 10. I have also applied
it in my GCC 9 build and all checks still succeed.

Kind regards
Daniel

Re: JNI segfault while running Java tests

Posted by Daniel Sahlberg <da...@gmail.com>.
Den sön 9 aug. 2020 15:14Nathan Hartman <ha...@gmail.com> skrev:

> On Sat, Aug 8, 2020 at 1:23 PM James McCoy <ja...@jamessan.com> wrote:
>
>> On Sat, Aug 08, 2020 at 10:35:14AM -0400, James McCoy wrote:
>> > The Debian builds for 1.14.0 recently started crashing while running the
>> > Java tests.  This is pretty far out of my expertise, so hopefully
>> > someone can help out.
>
>
> I don't know if it's related, but a few days ago we received a patch from
> Alexandr Miloslavskiy to fix a crash which is caused by a garbage collected
> object not being pinned before use by native code [1]. Perhaps Alexandr
> found the issue because of a similar crash to the one you're experiencing.
> Could you try the patch?
>

I thought about the same. However the patch didn't seem to make a
difference.

I can confirm James' statement that it crashes when compiled using GCC 10
but it seems to work with GCC 9. In my case I'm using Fedora 32 versus
Fedora 30 so I can't rule out that there are other differences but it seems
reasonable that GCC is doing something strange.

I have some done some preliminary investigations but I'm done yet, it seems
that the code is using an object that has already been destructed.

Kind regards
Daniel


> [1]
>
> https://lists.apache.org/x/thread.html/r1eaa02dd97d7b1c2376a76ed213ac03d1956f49f209b57ec4d84fe25@%3Cdev.subversion.apache.org%3E
>
> Nathan
>
>

Re: JNI segfault while running Java tests

Posted by Nathan Hartman <ha...@gmail.com>.
On Sat, Aug 8, 2020 at 1:23 PM James McCoy <ja...@jamessan.com> wrote:

> On Sat, Aug 08, 2020 at 10:35:14AM -0400, James McCoy wrote:
> > The Debian builds for 1.14.0 recently started crashing while running the
> > Java tests.  This is pretty far out of my expertise, so hopefully
> > someone can help out.


I don't know if it's related, but a few days ago we received a patch from
Alexandr Miloslavskiy to fix a crash which is caused by a garbage collected
object not being pinned before use by native code [1]. Perhaps Alexandr
found the issue because of a similar crash to the one you're experiencing.
Could you try the patch?

[1]
https://lists.apache.org/x/thread.html/r1eaa02dd97d7b1c2376a76ed213ac03d1956f49f209b57ec4d84fe25@%3Cdev.subversion.apache.org%3E

Nathan

Re: JNI segfault while running Java tests

Posted by James McCoy <ja...@jamessan.com>.
On Sat, Aug 08, 2020 at 10:35:14AM -0400, James McCoy wrote:
> The Debian builds for 1.14.0 recently started crashing while running the
> Java tests.  This is pretty far out of my expertise, so hopefully
> someone can help out.
> 
> I've attached the hs_err file.
> 
> The tests used to pass with OpenJDK 11.0.7+10 and we recently got an
> update to 11.0.8+10.  I'm trying to run against the older version to see
> if the tests still pass with that config.

Rebuilding with the older OpenJDK still fails.  Since the other major
update was switching from GCC 9 to GCC 10, I tried rebuilding with GCC 9
and that succeeds.

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB