You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Marc Strapetz <ma...@syntevo.com> on 2015/03/19 11:43:01 UTC

1.9 JavaHL memory leak in ISVNRemote#status

Attached example performs an endless series of remote status against the 
Subversion repository. When invoked with -Xmx24M, the VM will run out of 
memory soon. Monitoring with jvisualvm shows that the used heap size 
constantly grows. Monitoring with the Task Manager shows that the 
allocated memory grows even more (significantly). Looks like a memory 
leak, for which a large amount of native memory is involved, too.

Tested on Windows 8.1 with almost latest Subversion 1.9 JavaHL builds.

-Marc

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 19.03.2015 11:43, Marc Strapetz wrote:
> Attached example performs an endless series of remote status against
> the Subversion repository. When invoked with -Xmx24M, the VM will run
> out of memory soon. Monitoring with jvisualvm shows that the used heap
> size constantly grows. Monitoring with the Task Manager shows that the
> allocated memory grows even more (significantly). Looks like a memory
> leak, for which a large amount of native memory is involved, too.
>
> Tested on Windows 8.1 with almost latest Subversion 1.9 JavaHL builds.

Hi Marc,

Thanks for the report; I'll take a closer look tonight.

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Marc Strapetz <ma...@syntevo.com>.

On 29.04.2015 17:44, Branko Čibej wrote:
> On 29.04.2015 17:03, Branko Čibej wrote:
>> On 29.04.2015 16:02, Branko Čibej wrote:
>>> On 29.04.2015 11:57, Marc Strapetz wrote:
>>>> On 29.04.2015 05:31, Branko Čibej wrote:
>>>>> On 28.04.2015 21:22, Bert Huijben wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>>>>>>> Sent: dinsdag 28 april 2015 20:26
>>>>>>> To: Branko Čibej
>>>>>>> Cc: Subversion Development
>>>>>>> Subject: Re: 1.9 JavaHL memory leak in ISVNRemote#status
>>>>>>> Also, I should add that according to the Profiler, the byte[]s are
>>>>>>> referenced from the Checksums. The char[]s are referenced from the
>>>>>>> Strings. And the Strings are referenced directly as JNI local
>>>>>>> references. Browsing through these Strings, they seem to be
>>>>>>> server-side
>>>>>>> paths ("subversion/branches/1.8.x/...")
>>>>>> Just guessing: Notifications?
>>>>> No, this is an RA status edit drive; there are no notifications, only
>>>>> editor callbacks, and the checksum objects are created in in the
>>>>> callbacks related to file content changes (file contents streams and
>>>>> checksums always come in pairs).
>>>>>
>>>>> I counted creations, finalizations and garbage collections again. I
>>>>> added forced finalization and GC calls to the test case. For every loop
>>>>> in the test, we create 57 Checksum instances, but only one of them is
>>>>> finalized, no matter how often the finalizer and GC are run. All the
>>>>> Checksum objects are created in the same way, and here are /no/
>>>>> references anywhere to the remaining 56 objects, yet they're neither
>>>>> finalized nor garbage-collected. The fields (byte array and kind) /are/
>>>>> collected; all the "live" (according to the heap profiler) Checksum
>>>>> objects have their fields set to null.
>>>> I've been testing on Windows. According to JProfiler and JVisualVM,
>>>> byte[]s are still referenced from the Checksums. Hence, I would expect
>>>> that they are not garbage collected.
>>>>
>>>>> clearly, the code is cleaning up the references correctly
>>>> I don't have detailed understanding of the "jniwrapper" package, but I
>>>> tend to agree with you. In the native code, CreateJ::Checksum and
>>>> CreateJ::PropertyMap are basically doing the same thing, so there is
>>>> no reason why Checksums would remain referenced while HashMaps
>>>> properly do not.
>>>>
>>>> I've also tried to comment out all env.CallVoidMethod()-callbacks in
>>>> EditorProxy.cpp, so created object references would not even be passed
>>>> into the Java code. Still the same, Checksums remain as "JNI local
>>>> reference".
>>>>
>>>> Finally, I've tried to explicitly call DeleteLocalRef(). This /solves/
>>>> the memory leak (at least for Checksums), but I don't understand why
>>>> this is necessary and whether this is correct.
>>>>
>>>> svn_error_t*
>>>> EditorProxy::cb_alter_file(void *baton,
>>>>                             const char *relpath,
>>>>    ...
>>>>    jstring jrelpath = JNIUtil::makeJString(relpath);
>>>>    SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>>>>    jobject jchecksum = CreateJ::Checksum(checksum);
>>>>    SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>>>>    jobject jprops = CreateJ::PropertyMap(props, scratch_pool);
>>>>    SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>>>>
>>>>    jobject jcontents = NULL;
>>>>    if (contents != NULL)
>>>>      jcontents = wrap_input_stream(contents);
>>>>
>>>>    env.CallVoidMethod(ep->m_jeditor, mid,
>>>>                       jrelpath, jlong(revision),
>>>>                       jchecksum, jcontents, jprops);
>>>>
>>>>    env.DeleteLocalRef(jrelpath);
>>>>    env.DeleteLocalRef(jchecksum);
>>>>    env.DeleteLocalRef(jprops);
>>>>    if (contents != NULL)
>>>>      env.DeleteLocalRef(jcontents);
>>>>    ...
>>>>
>>>>> but for some unfathomable reason, the collector keeps them
>>>>> alive for a while.
>>>> I'm not entirely sure about the exact difference of the live data in
>>>> the VM and a heap dump, but IMO the Checksums are still considered as
>>>> referenced ("JNI local reference") and hence will never be garbage
>>>> collected. The profilers confirm this.
>>>>
>>>> Given that DeleteLocalRef solves the problem, I think this is either a
>>>> bug in the jniwrapper or a bug in JNI itself.
>>> The latest code wraps the callback implementations with
>>> PushLocalFrame/PopLocalFrame; any references created within a local
>>> frame should be automatically deleted by PopLocalFrame, according to all
>>> JNI docs I can find.
>>>
>>> I can add the explicit deletions, but it's a shame that frame management
>>> wouldn't work as expected. :( So, I'm going to double-check if we're
>>> actually getting the frame management right. I can't imagine why the
>>> HashMaps and NativeInputStreams would be released, but not the Checksums.
>>>
>>> All in all, I agree with you that this looks like a JNI bug ... the
>>> trick now will be to prove that with a minimal test case and report it
>>> upstream. :)
>>>
>>> (FWIW, I'm using Java 8u45 64-bit on OSX.)
>>
>> So, interesting data point ... I moved the creation of the Checksum
>> objects after the creation of the property maps ... and now they're
>> getting garbage-collected. This is becoming extremely weird.
>
> Hah. Fixed it. http://svn.apache.org/r1676771
>
> We were not properly popping off a JNI frame in CreateJ::PropertyMap, so
> the checksum objects were referenced in a JNI frame that was left
> hanging in limbo after the call to finishReport returned.
>
> I'm running your test case now (with -Xmx24M), currently at over 350
> iterations with total memory usage stable at around 50MB with no forced
> garbage collections.

I can confirm: two subsequent heap dumps with sufficient time between 
them do not differ even in a single instance :) Great!

Thanks, Brane, for staying on that issue -- I expect that this will 
solve a large amount of OOMEs we are getting reported.

-Marc

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 29.04.2015 17:03, Branko Čibej wrote:
> On 29.04.2015 16:02, Branko Čibej wrote:
>> On 29.04.2015 11:57, Marc Strapetz wrote:
>>> On 29.04.2015 05:31, Branko Čibej wrote:
>>>> On 28.04.2015 21:22, Bert Huijben wrote:
>>>>>> -----Original Message-----
>>>>>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>>>>>> Sent: dinsdag 28 april 2015 20:26
>>>>>> To: Branko Čibej
>>>>>> Cc: Subversion Development
>>>>>> Subject: Re: 1.9 JavaHL memory leak in ISVNRemote#status
>>>>>> Also, I should add that according to the Profiler, the byte[]s are
>>>>>> referenced from the Checksums. The char[]s are referenced from the
>>>>>> Strings. And the Strings are referenced directly as JNI local
>>>>>> references. Browsing through these Strings, they seem to be
>>>>>> server-side
>>>>>> paths ("subversion/branches/1.8.x/...")
>>>>> Just guessing: Notifications?
>>>> No, this is an RA status edit drive; there are no notifications, only
>>>> editor callbacks, and the checksum objects are created in in the
>>>> callbacks related to file content changes (file contents streams and
>>>> checksums always come in pairs).
>>>>
>>>> I counted creations, finalizations and garbage collections again. I
>>>> added forced finalization and GC calls to the test case. For every loop
>>>> in the test, we create 57 Checksum instances, but only one of them is
>>>> finalized, no matter how often the finalizer and GC are run. All the
>>>> Checksum objects are created in the same way, and here are /no/
>>>> references anywhere to the remaining 56 objects, yet they're neither
>>>> finalized nor garbage-collected. The fields (byte array and kind) /are/
>>>> collected; all the "live" (according to the heap profiler) Checksum
>>>> objects have their fields set to null.
>>> I've been testing on Windows. According to JProfiler and JVisualVM,
>>> byte[]s are still referenced from the Checksums. Hence, I would expect
>>> that they are not garbage collected.
>>>
>>>> clearly, the code is cleaning up the references correctly
>>> I don't have detailed understanding of the "jniwrapper" package, but I
>>> tend to agree with you. In the native code, CreateJ::Checksum and
>>> CreateJ::PropertyMap are basically doing the same thing, so there is
>>> no reason why Checksums would remain referenced while HashMaps
>>> properly do not.
>>>
>>> I've also tried to comment out all env.CallVoidMethod()-callbacks in
>>> EditorProxy.cpp, so created object references would not even be passed
>>> into the Java code. Still the same, Checksums remain as "JNI local
>>> reference".
>>>
>>> Finally, I've tried to explicitly call DeleteLocalRef(). This /solves/
>>> the memory leak (at least for Checksums), but I don't understand why
>>> this is necessary and whether this is correct.
>>>
>>> svn_error_t*
>>> EditorProxy::cb_alter_file(void *baton,
>>>                            const char *relpath,
>>>   ...
>>>   jstring jrelpath = JNIUtil::makeJString(relpath);
>>>   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>>>   jobject jchecksum = CreateJ::Checksum(checksum);
>>>   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>>>   jobject jprops = CreateJ::PropertyMap(props, scratch_pool);
>>>   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>>>
>>>   jobject jcontents = NULL;
>>>   if (contents != NULL)
>>>     jcontents = wrap_input_stream(contents);
>>>
>>>   env.CallVoidMethod(ep->m_jeditor, mid,
>>>                      jrelpath, jlong(revision),
>>>                      jchecksum, jcontents, jprops);
>>>
>>>   env.DeleteLocalRef(jrelpath);
>>>   env.DeleteLocalRef(jchecksum);
>>>   env.DeleteLocalRef(jprops);
>>>   if (contents != NULL)
>>>     env.DeleteLocalRef(jcontents);
>>>   ...
>>>
>>>> but for some unfathomable reason, the collector keeps them
>>>> alive for a while.
>>> I'm not entirely sure about the exact difference of the live data in
>>> the VM and a heap dump, but IMO the Checksums are still considered as
>>> referenced ("JNI local reference") and hence will never be garbage
>>> collected. The profilers confirm this.
>>>
>>> Given that DeleteLocalRef solves the problem, I think this is either a
>>> bug in the jniwrapper or a bug in JNI itself.
>> The latest code wraps the callback implementations with
>> PushLocalFrame/PopLocalFrame; any references created within a local
>> frame should be automatically deleted by PopLocalFrame, according to all
>> JNI docs I can find.
>>
>> I can add the explicit deletions, but it's a shame that frame management
>> wouldn't work as expected. :( So, I'm going to double-check if we're
>> actually getting the frame management right. I can't imagine why the
>> HashMaps and NativeInputStreams would be released, but not the Checksums.
>>
>> All in all, I agree with you that this looks like a JNI bug ... the
>> trick now will be to prove that with a minimal test case and report it
>> upstream. :)
>>
>> (FWIW, I'm using Java 8u45 64-bit on OSX.)
>
> So, interesting data point ... I moved the creation of the Checksum
> objects after the creation of the property maps ... and now they're
> getting garbage-collected. This is becoming extremely weird.

Hah. Fixed it. http://svn.apache.org/r1676771

We were not properly popping off a JNI frame in CreateJ::PropertyMap, so
the checksum objects were referenced in a JNI frame that was left
hanging in limbo after the call to finishReport returned.

I'm running your test case now (with -Xmx24M), currently at over 350
iterations with total memory usage stable at around 50MB with no forced
garbage collections.

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 29.04.2015 16:02, Branko Čibej wrote:
> On 29.04.2015 11:57, Marc Strapetz wrote:
>> On 29.04.2015 05:31, Branko Čibej wrote:
>>> On 28.04.2015 21:22, Bert Huijben wrote:
>>>>> -----Original Message-----
>>>>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>>>>> Sent: dinsdag 28 april 2015 20:26
>>>>> To: Branko Čibej
>>>>> Cc: Subversion Development
>>>>> Subject: Re: 1.9 JavaHL memory leak in ISVNRemote#status
>>>>> Also, I should add that according to the Profiler, the byte[]s are
>>>>> referenced from the Checksums. The char[]s are referenced from the
>>>>> Strings. And the Strings are referenced directly as JNI local
>>>>> references. Browsing through these Strings, they seem to be
>>>>> server-side
>>>>> paths ("subversion/branches/1.8.x/...")
>>>> Just guessing: Notifications?
>>>
>>> No, this is an RA status edit drive; there are no notifications, only
>>> editor callbacks, and the checksum objects are created in in the
>>> callbacks related to file content changes (file contents streams and
>>> checksums always come in pairs).
>>>
>>> I counted creations, finalizations and garbage collections again. I
>>> added forced finalization and GC calls to the test case. For every loop
>>> in the test, we create 57 Checksum instances, but only one of them is
>>> finalized, no matter how often the finalizer and GC are run. All the
>>> Checksum objects are created in the same way, and here are /no/
>>> references anywhere to the remaining 56 objects, yet they're neither
>>> finalized nor garbage-collected. The fields (byte array and kind) /are/
>>> collected; all the "live" (according to the heap profiler) Checksum
>>> objects have their fields set to null.
>> I've been testing on Windows. According to JProfiler and JVisualVM,
>> byte[]s are still referenced from the Checksums. Hence, I would expect
>> that they are not garbage collected.
>>
>>> clearly, the code is cleaning up the references correctly
>> I don't have detailed understanding of the "jniwrapper" package, but I
>> tend to agree with you. In the native code, CreateJ::Checksum and
>> CreateJ::PropertyMap are basically doing the same thing, so there is
>> no reason why Checksums would remain referenced while HashMaps
>> properly do not.
>>
>> I've also tried to comment out all env.CallVoidMethod()-callbacks in
>> EditorProxy.cpp, so created object references would not even be passed
>> into the Java code. Still the same, Checksums remain as "JNI local
>> reference".
>>
>> Finally, I've tried to explicitly call DeleteLocalRef(). This /solves/
>> the memory leak (at least for Checksums), but I don't understand why
>> this is necessary and whether this is correct.
>>
>> svn_error_t*
>> EditorProxy::cb_alter_file(void *baton,
>>                            const char *relpath,
>>   ...
>>   jstring jrelpath = JNIUtil::makeJString(relpath);
>>   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>>   jobject jchecksum = CreateJ::Checksum(checksum);
>>   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>>   jobject jprops = CreateJ::PropertyMap(props, scratch_pool);
>>   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>>
>>   jobject jcontents = NULL;
>>   if (contents != NULL)
>>     jcontents = wrap_input_stream(contents);
>>
>>   env.CallVoidMethod(ep->m_jeditor, mid,
>>                      jrelpath, jlong(revision),
>>                      jchecksum, jcontents, jprops);
>>
>>   env.DeleteLocalRef(jrelpath);
>>   env.DeleteLocalRef(jchecksum);
>>   env.DeleteLocalRef(jprops);
>>   if (contents != NULL)
>>     env.DeleteLocalRef(jcontents);
>>   ...
>>
>>> but for some unfathomable reason, the collector keeps them
>>> alive for a while.
>> I'm not entirely sure about the exact difference of the live data in
>> the VM and a heap dump, but IMO the Checksums are still considered as
>> referenced ("JNI local reference") and hence will never be garbage
>> collected. The profilers confirm this.
>>
>> Given that DeleteLocalRef solves the problem, I think this is either a
>> bug in the jniwrapper or a bug in JNI itself.
> The latest code wraps the callback implementations with
> PushLocalFrame/PopLocalFrame; any references created within a local
> frame should be automatically deleted by PopLocalFrame, according to all
> JNI docs I can find.
>
> I can add the explicit deletions, but it's a shame that frame management
> wouldn't work as expected. :( So, I'm going to double-check if we're
> actually getting the frame management right. I can't imagine why the
> HashMaps and NativeInputStreams would be released, but not the Checksums.
>
> All in all, I agree with you that this looks like a JNI bug ... the
> trick now will be to prove that with a minimal test case and report it
> upstream. :)
>
> (FWIW, I'm using Java 8u45 64-bit on OSX.)


So, interesting data point ... I moved the creation of the Checksum
objects after the creation of the property maps ... and now they're
getting garbage-collected. This is becoming extremely weird.

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 29.04.2015 11:57, Marc Strapetz wrote:
> On 29.04.2015 05:31, Branko Čibej wrote:
>> On 28.04.2015 21:22, Bert Huijben wrote:
>>>
>>>> -----Original Message-----
>>>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>>>> Sent: dinsdag 28 april 2015 20:26
>>>> To: Branko Čibej
>>>> Cc: Subversion Development
>>>> Subject: Re: 1.9 JavaHL memory leak in ISVNRemote#status
>>>> Also, I should add that according to the Profiler, the byte[]s are
>>>> referenced from the Checksums. The char[]s are referenced from the
>>>> Strings. And the Strings are referenced directly as JNI local
>>>> references. Browsing through these Strings, they seem to be
>>>> server-side
>>>> paths ("subversion/branches/1.8.x/...")
>>> Just guessing: Notifications?
>>
>>
>> No, this is an RA status edit drive; there are no notifications, only
>> editor callbacks, and the checksum objects are created in in the
>> callbacks related to file content changes (file contents streams and
>> checksums always come in pairs).
>>
>> I counted creations, finalizations and garbage collections again. I
>> added forced finalization and GC calls to the test case. For every loop
>> in the test, we create 57 Checksum instances, but only one of them is
>> finalized, no matter how often the finalizer and GC are run. All the
>> Checksum objects are created in the same way, and here are /no/
>> references anywhere to the remaining 56 objects, yet they're neither
>> finalized nor garbage-collected. The fields (byte array and kind) /are/
>> collected; all the "live" (according to the heap profiler) Checksum
>> objects have their fields set to null.
>
> I've been testing on Windows. According to JProfiler and JVisualVM,
> byte[]s are still referenced from the Checksums. Hence, I would expect
> that they are not garbage collected.
>
>> clearly, the code is cleaning up the references correctly
>
> I don't have detailed understanding of the "jniwrapper" package, but I
> tend to agree with you. In the native code, CreateJ::Checksum and
> CreateJ::PropertyMap are basically doing the same thing, so there is
> no reason why Checksums would remain referenced while HashMaps
> properly do not.
>
> I've also tried to comment out all env.CallVoidMethod()-callbacks in
> EditorProxy.cpp, so created object references would not even be passed
> into the Java code. Still the same, Checksums remain as "JNI local
> reference".
>
> Finally, I've tried to explicitly call DeleteLocalRef(). This /solves/
> the memory leak (at least for Checksums), but I don't understand why
> this is necessary and whether this is correct.
>
> svn_error_t*
> EditorProxy::cb_alter_file(void *baton,
>                            const char *relpath,
>   ...
>   jstring jrelpath = JNIUtil::makeJString(relpath);
>   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>   jobject jchecksum = CreateJ::Checksum(checksum);
>   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>   jobject jprops = CreateJ::PropertyMap(props, scratch_pool);
>   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
>
>   jobject jcontents = NULL;
>   if (contents != NULL)
>     jcontents = wrap_input_stream(contents);
>
>   env.CallVoidMethod(ep->m_jeditor, mid,
>                      jrelpath, jlong(revision),
>                      jchecksum, jcontents, jprops);
>
>   env.DeleteLocalRef(jrelpath);
>   env.DeleteLocalRef(jchecksum);
>   env.DeleteLocalRef(jprops);
>   if (contents != NULL)
>     env.DeleteLocalRef(jcontents);
>   ...
>
>> but for some unfathomable reason, the collector keeps them
>> alive for a while.
>
> I'm not entirely sure about the exact difference of the live data in
> the VM and a heap dump, but IMO the Checksums are still considered as
> referenced ("JNI local reference") and hence will never be garbage
> collected. The profilers confirm this.
>
> Given that DeleteLocalRef solves the problem, I think this is either a
> bug in the jniwrapper or a bug in JNI itself.

The latest code wraps the callback implementations with
PushLocalFrame/PopLocalFrame; any references created within a local
frame should be automatically deleted by PopLocalFrame, according to all
JNI docs I can find.

I can add the explicit deletions, but it's a shame that frame management
wouldn't work as expected. :( So, I'm going to double-check if we're
actually getting the frame management right. I can't imagine why the
HashMaps and NativeInputStreams would be released, but not the Checksums.

All in all, I agree with you that this looks like a JNI bug ... the
trick now will be to prove that with a minimal test case and report it
upstream. :)

(FWIW, I'm using Java 8u45 64-bit on OSX.)

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Marc Strapetz <ma...@syntevo.com>.

On 29.04.2015 05:31, Branko Čibej wrote:
> On 28.04.2015 21:22, Bert Huijben wrote:
>>
>>> -----Original Message-----
>>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>>> Sent: dinsdag 28 april 2015 20:26
>>> To: Branko Čibej
>>> Cc: Subversion Development
>>> Subject: Re: 1.9 JavaHL memory leak in ISVNRemote#status
>>> Also, I should add that according to the Profiler, the byte[]s are
>>> referenced from the Checksums. The char[]s are referenced from the
>>> Strings. And the Strings are referenced directly as JNI local
>>> references. Browsing through these Strings, they seem to be server-side
>>> paths ("subversion/branches/1.8.x/...")
>> Just guessing: Notifications?
>
>
> No, this is an RA status edit drive; there are no notifications, only
> editor callbacks, and the checksum objects are created in in the
> callbacks related to file content changes (file contents streams and
> checksums always come in pairs).
>
> I counted creations, finalizations and garbage collections again. I
> added forced finalization and GC calls to the test case. For every loop
> in the test, we create 57 Checksum instances, but only one of them is
> finalized, no matter how often the finalizer and GC are run. All the
> Checksum objects are created in the same way, and here are /no/
> references anywhere to the remaining 56 objects, yet they're neither
> finalized nor garbage-collected. The fields (byte array and kind) /are/
> collected; all the "live" (according to the heap profiler) Checksum
> objects have their fields set to null.

I've been testing on Windows. According to JProfiler and JVisualVM, 
byte[]s are still referenced from the Checksums. Hence, I would expect 
that they are not garbage collected.

>clearly, the code is cleaning up the references correctly

I don't have detailed understanding of the "jniwrapper" package, but I 
tend to agree with you. In the native code, CreateJ::Checksum and 
CreateJ::PropertyMap are basically doing the same thing, so there is no 
reason why Checksums would remain referenced while HashMaps properly do not.

I've also tried to comment out all env.CallVoidMethod()-callbacks in 
EditorProxy.cpp, so created object references would not even be passed 
into the Java code. Still the same, Checksums remain as "JNI local 
reference".

Finally, I've tried to explicitly call DeleteLocalRef(). This /solves/ 
the memory leak (at least for Checksums), but I don't understand why 
this is necessary and whether this is correct.

svn_error_t*
EditorProxy::cb_alter_file(void *baton,
                            const char *relpath,
   ...
   jstring jrelpath = JNIUtil::makeJString(relpath);
   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
   jobject jchecksum = CreateJ::Checksum(checksum);
   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);
   jobject jprops = CreateJ::PropertyMap(props, scratch_pool);
   SVN_JAVAHL_OLDSTYLE_EXCEPTION_CHECK(env);

   jobject jcontents = NULL;
   if (contents != NULL)
     jcontents = wrap_input_stream(contents);

   env.CallVoidMethod(ep->m_jeditor, mid,
                      jrelpath, jlong(revision),
                      jchecksum, jcontents, jprops);

   env.DeleteLocalRef(jrelpath);
   env.DeleteLocalRef(jchecksum);
   env.DeleteLocalRef(jprops);
   if (contents != NULL)
     env.DeleteLocalRef(jcontents);
   ...

> but for some unfathomable reason, the collector keeps them
> alive for a while.

I'm not entirely sure about the exact difference of the live data in the 
VM and a heap dump, but IMO the Checksums are still considered as 
referenced ("JNI local reference") and hence will never be garbage 
collected. The profilers confirm this.

Given that DeleteLocalRef solves the problem, I think this is either a 
bug in the jniwrapper or a bug in JNI itself.

-Marc

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 28.04.2015 21:22, Bert Huijben wrote:
>
>> -----Original Message-----
>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>> Sent: dinsdag 28 april 2015 20:26
>> To: Branko Čibej
>> Cc: Subversion Development
>> Subject: Re: 1.9 JavaHL memory leak in ISVNRemote#status
>> Also, I should add that according to the Profiler, the byte[]s are
>> referenced from the Checksums. The char[]s are referenced from the
>> Strings. And the Strings are referenced directly as JNI local
>> references. Browsing through these Strings, they seem to be server-side
>> paths ("subversion/branches/1.8.x/...")
> Just guessing: Notifications?

No, this is an RA status edit drive; there are no notifications, only
editor callbacks, and the checksum objects are created in in the
callbacks related to file content changes (file contents streams and
checksums always come in pairs).

I counted creations, finalizations and garbage collections again. I
added forced finalization and GC calls to the test case. For every loop
in the test, we create 57 Checksum instances, but only one of them is
finalized, no matter how often the finalizer and GC are run. All the
Checksum objects are created in the same way, and here are /no/
references anywhere to the remaining 56 objects, yet they're neither
finalized nor garbage-collected. The fields (byte array and kind) /are/
collected; all the "live" (according to the heap profiler) Checksum
objects have their fields set to null.

I'm going to write this off as an implementation detail of the Java
garbage collector; clearly, the code is cleaning up the references
correctly, but for some unfathomable reason, the collector keeps them
alive for a while. Maybe this will behave differently in a real
application than in this limited test case.

-- Brane

RE: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Bert Huijben <be...@qqmail.nl>.


> -----Original Message-----
> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
> Sent: dinsdag 28 april 2015 20:26
> To: Branko Čibej
> Cc: Subversion Development
> Subject: Re: 1.9 JavaHL memory leak in ISVNRemote#status

> Also, I should add that according to the Profiler, the byte[]s are
> referenced from the Checksums. The char[]s are referenced from the
> Strings. And the Strings are referenced directly as JNI local
> references. Browsing through these Strings, they seem to be server-side
> paths ("subversion/branches/1.8.x/...")

Just guessing: Notifications?

	Bert

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Marc Strapetz <ma...@syntevo.com>.

On 28.04.2015 20:06, Marc Strapetz wrote:
> On 28.04.2015 18:12, Branko Čibej wrote:
>> On 28.04.2015 18:03, Marc Strapetz wrote:
>>> Hi Brane,
>>>
>>> On 28.04.2015 07:36, Branko Čibej wrote:
>>>> On 24.04.2015 14:11, Branko Čibej wrote:
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> Just a quick note: your last msg jogged my memory and I think I know
>>>>> the root cause of the leak: improper JNI frame management within a
>>>>> loop. If I'm right, I can both fix the leak and remove the
>>>>> close-stream requirement I just added.
>>>>>
>>>>> On 24 Apr 2015 11:00 am, "Marc Strapetz" <marc.strapetz@syntevo.com
>>>>> <ma...@syntevo.com>> wrote:
>>>>>
>>>>>      On 24.04.2015 06 <tel:24.04.2015%2006>:34, Branko Čibej wrote:
>>>>>
>>>>>          On 22.03.2015 05 <tel:22.03.2015%2005>:06, Branko Čibej
>>>>> wrote:
>>>>>
>>>>>              On 21.03.2015 16 <tel:21.03.2015%2016>:23, Branko Čibej
>>>>> wrote:
>>>>>
>>>>>                  On 19.03.2015 11:43, Marc Strapetz wrote:
>>>>>
>>>>>                      Attached example performs an endless series of
>>>>>                      remote status against
>>>>>                      the Subversion repository. When invoked with
>>>>>                      -Xmx24M, the VM will run
>>>>>                      out of memory soon. Monitoring with jvisualvm
>>>>>                      shows that the used heap
>>>>>                      size constantly grows. Monitoring with the Task
>>>>>                      Manager shows that the
>>>>>                      allocated memory grows even more (significantly).
>>>>>                      Looks like a memory
>>>>>                      leak, for which a large amount of native
>>>>> memory is
>>>>>                      involved, too.
>>>>>
>>>>>                      Tested on Windows 8.1 with almost latest
>>>>>                      Subversion 1.9 JavaHL builds.
>>>>>
>>>>>                  I can confirm that this happens on the Mac, too, and
>>>>>                  it's not a garbage
>>>>>                  collector artefact. I'm trying to trace where the
>>>>> leak
>>>>>                  is happening ...
>>>>>                  valgrind with APR pool debugging doesn't tell me much
>>>>>                  (no surprise there).
>>>>>
>>>>>              Just to make sure we weren't doing something bad in our
>>>>>              libraries, I
>>>>>              wrote a small C program that does the same as your Java
>>>>>              example (Ev2
>>>>>              shims included), and memory usage is completely
>>>>> steady. So
>>>>>              it is
>>>>>              something in JavaHL, but I have no clue yet what the
>>>>>              problem is.
>>>>>
>>>>>
>>>>>          I have to say this was one of the more "interesting"
>>>>> bug-hunts
>>>>>          in my not
>>>>>          entirely boring career, and that's not really obvious from
>>>>> the fix
>>>>>          itself. :)
>>>>>
>>>>>          http://svn.apache.org/r1675771
>>>>>
>>>>>          Marc: this will not be in RC1, but please give the patch a
>>>>>          spin and let
>>>>>          me know if it fixes your problem. I tested this with the Java
>>>>>          program
>>>>>          you attached to your original report, and heap size no longer
>>>>>          grows
>>>>>          without bounds.
>>>>>
>>>>>
>>>>>      Great hunt, Brane! The native leak seems to be fixed. I've run my
>>>>>      remote status loop with -Xmx24M and still get an OOME after ~170
>>>>>      loop iterations. The memory leak is significantly smaller and
>>>>> this
>>>>>      time it seems to be in the Java part. According to the profiler,
>>>>>      most memory is allocated by HashMap and friends, referenced from
>>>>>      JNI code. Only two org.apache.subversion classes show up, but I
>>>>>      guess they indicate the source of the leak:
>>>>>
>>>>>      org.apache.subversion.javahl.types.Checksum (~10K instances)
>>>>>      org.apache.subversion.javahl.types.NativeInputStream (~10K
>>>>> instances)
>>>>>
>>>>>      Let me know, if you more profiler statistics will be helpful.
>>>>>
>>>>
>>>> So I've been looking at this in depth. At first I thought that one of
>>>> the problems was that we didn't release JNI local references; I added
>>>> code to make sure this happens in the status callbacks (not committed
>>>> yet) and I verified that all the native wrapped objects do get
>>>> finalized. However, the Java objects still hang around.
>>>>
>>>> One of the problems is that all the callbacks happen within the
>>>> scope of
>>>> the ISVNReporter.finishReport call, which means that the whole edit
>>>> drive is considered a single JNI call (despite the callbacks to Java)
>>>> and the garbage collector can't reclaim space for the objects created
>>>> within JNI during that time. But even a forced GC after the report is
>>>> done and the remote session disposed won't release all the native
>>>> references. I'm a bit stumped here ... JVM's built-in memory profiler
>>>> shows the live references and where they're allocated, but doesn't show
>>>> why they're not released even when I explicitly create and destroy JNI
>>>> frames.
>>>
>>> Can you please commit your current state somewhere or send me a patch?
>>> I can give this one a try in JProfiler and see whether I can gather
>>> some more useful information.
>>
>> Here's the complete patch against 1.9.x. I do see some NativeInputStream
>> objects (but not all) being garbage-collected, but there are a number of
>> other objects, even those allocated in Java code in the callback, that
>> are just hanging around, even if I force a GC. Any help in tracking this
>> down will be greatly appreciated.
>
> Thanks, Brane. I'm not sure whether I can confirm this: I took a couple
> of heap dumps while running the remote status 1000 times in a loop and
> my impression was that the amount of almost all objects was basically
> constant, except for Checksum which was constantly rising. The only
> reference to those Checksum objects are "JNI local references". The
> final heap (after 1000 iterations) looks like this:
>
> http://imgur.com/gLdnv84
>
> As you can see, there are no "contents" InputStreams nor "properties"
> (Hash)Maps referenced. AFAIU, I should see almost the same amount of
> objects in all three cases?

Also, I should add that according to the Profiler, the byte[]s are 
referenced from the Checksums. The char[]s are referenced from the 
Strings. And the Strings are referenced directly as JNI local 
references. Browsing through these Strings, they seem to be server-side 
paths ("subversion/branches/1.8.x/...")

-Marc

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 28.04.2015 20:06, Marc Strapetz wrote:
> On 28.04.2015 18:12, Branko Čibej wrote:
>> On 28.04.2015 18:03, Marc Strapetz wrote:
>>> Hi Brane,
>>>
>>> On 28.04.2015 07:36, Branko Čibej wrote:
>>>> On 24.04.2015 14:11, Branko Čibej wrote:
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> Just a quick note: your last msg jogged my memory and I think I know
>>>>> the root cause of the leak: improper JNI frame management within a
>>>>> loop. If I'm right, I can both fix the leak and remove the
>>>>> close-stream requirement I just added.
>>>>>
>>>>> On 24 Apr 2015 11:00 am, "Marc Strapetz" <marc.strapetz@syntevo.com
>>>>> <ma...@syntevo.com>> wrote:
>>>>>
>>>>>      On 24.04.2015 06 <tel:24.04.2015%2006>:34, Branko Čibej wrote:
>>>>>
>>>>>          On 22.03.2015 05 <tel:22.03.2015%2005>:06, Branko Čibej
>>>>> wrote:
>>>>>
>>>>>              On 21.03.2015 16 <tel:21.03.2015%2016>:23, Branko Čibej
>>>>> wrote:
>>>>>
>>>>>                  On 19.03.2015 11:43, Marc Strapetz wrote:
>>>>>
>>>>>                      Attached example performs an endless series of
>>>>>                      remote status against
>>>>>                      the Subversion repository. When invoked with
>>>>>                      -Xmx24M, the VM will run
>>>>>                      out of memory soon. Monitoring with jvisualvm
>>>>>                      shows that the used heap
>>>>>                      size constantly grows. Monitoring with the Task
>>>>>                      Manager shows that the
>>>>>                      allocated memory grows even more
>>>>> (significantly).
>>>>>                      Looks like a memory
>>>>>                      leak, for which a large amount of native
>>>>> memory is
>>>>>                      involved, too.
>>>>>
>>>>>                      Tested on Windows 8.1 with almost latest
>>>>>                      Subversion 1.9 JavaHL builds.
>>>>>
>>>>>                  I can confirm that this happens on the Mac, too, and
>>>>>                  it's not a garbage
>>>>>                  collector artefact. I'm trying to trace where the
>>>>> leak
>>>>>                  is happening ...
>>>>>                  valgrind with APR pool debugging doesn't tell me
>>>>> much
>>>>>                  (no surprise there).
>>>>>
>>>>>              Just to make sure we weren't doing something bad in our
>>>>>              libraries, I
>>>>>              wrote a small C program that does the same as your Java
>>>>>              example (Ev2
>>>>>              shims included), and memory usage is completely
>>>>> steady. So
>>>>>              it is
>>>>>              something in JavaHL, but I have no clue yet what the
>>>>>              problem is.
>>>>>
>>>>>
>>>>>          I have to say this was one of the more "interesting"
>>>>> bug-hunts
>>>>>          in my not
>>>>>          entirely boring career, and that's not really obvious from
>>>>> the fix
>>>>>          itself. :)
>>>>>
>>>>>          http://svn.apache.org/r1675771
>>>>>
>>>>>          Marc: this will not be in RC1, but please give the patch a
>>>>>          spin and let
>>>>>          me know if it fixes your problem. I tested this with the
>>>>> Java
>>>>>          program
>>>>>          you attached to your original report, and heap size no
>>>>> longer
>>>>>          grows
>>>>>          without bounds.
>>>>>
>>>>>
>>>>>      Great hunt, Brane! The native leak seems to be fixed. I've
>>>>> run my
>>>>>      remote status loop with -Xmx24M and still get an OOME after ~170
>>>>>      loop iterations. The memory leak is significantly smaller and
>>>>> this
>>>>>      time it seems to be in the Java part. According to the profiler,
>>>>>      most memory is allocated by HashMap and friends, referenced from
>>>>>      JNI code. Only two org.apache.subversion classes show up, but I
>>>>>      guess they indicate the source of the leak:
>>>>>
>>>>>      org.apache.subversion.javahl.types.Checksum (~10K instances)
>>>>>      org.apache.subversion.javahl.types.NativeInputStream (~10K
>>>>> instances)
>>>>>
>>>>>      Let me know, if you more profiler statistics will be helpful.
>>>>>
>>>>
>>>> So I've been looking at this in depth. At first I thought that one of
>>>> the problems was that we didn't release JNI local references; I added
>>>> code to make sure this happens in the status callbacks (not committed
>>>> yet) and I verified that all the native wrapped objects do get
>>>> finalized. However, the Java objects still hang around.
>>>>
>>>> One of the problems is that all the callbacks happen within the
>>>> scope of
>>>> the ISVNReporter.finishReport call, which means that the whole edit
>>>> drive is considered a single JNI call (despite the callbacks to Java)
>>>> and the garbage collector can't reclaim space for the objects created
>>>> within JNI during that time. But even a forced GC after the report is
>>>> done and the remote session disposed won't release all the native
>>>> references. I'm a bit stumped here ... JVM's built-in memory profiler
>>>> shows the live references and where they're allocated, but doesn't
>>>> show
>>>> why they're not released even when I explicitly create and destroy JNI
>>>> frames.
>>>
>>> Can you please commit your current state somewhere or send me a patch?
>>> I can give this one a try in JProfiler and see whether I can gather
>>> some more useful information.
>>
>> Here's the complete patch against 1.9.x. I do see some NativeInputStream
>> objects (but not all) being garbage-collected, but there are a number of
>> other objects, even those allocated in Java code in the callback, that
>> are just hanging around, even if I force a GC. Any help in tracking this
>> down will be greatly appreciated.
>
> Thanks, Brane. I'm not sure whether I can confirm this: I took a
> couple of heap dumps while running the remote status 1000 times in a
> loop and my impression was that the amount of almost all objects was
> basically constant, except for Checksum which was constantly rising.
> The only reference to those Checksum objects are "JNI local
> references". The final heap (after 1000 iterations) looks like this:
>
> http://imgur.com/gLdnv84
>
> As you can see, there are no "contents" InputStreams nor "properties"
> (Hash)Maps referenced. AFAIU, I should see almost the same amount of
> objects in all three cases?

Yes, we create the same number of Checksums and NativeInputStreams and
it seems that I did manage to clear the references to the streams, which
is why they're now getting garbage-collected. I've not been able to
figure out why the same is not happening to Checksums (and all the
objects allocated for the remote status events).

I'll keep digging. This is probably some really silly omission
somewhere, it's just a matter of finding it.

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Marc Strapetz <ma...@syntevo.com>.

On 28.04.2015 18:12, Branko Čibej wrote:
> On 28.04.2015 18:03, Marc Strapetz wrote:
>> Hi Brane,
>>
>> On 28.04.2015 07:36, Branko Čibej wrote:
>>> On 24.04.2015 14:11, Branko Čibej wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> Just a quick note: your last msg jogged my memory and I think I know
>>>> the root cause of the leak: improper JNI frame management within a
>>>> loop. If I'm right, I can both fix the leak and remove the
>>>> close-stream requirement I just added.
>>>>
>>>> On 24 Apr 2015 11:00 am, "Marc Strapetz" <marc.strapetz@syntevo.com
>>>> <ma...@syntevo.com>> wrote:
>>>>
>>>>      On 24.04.2015 06 <tel:24.04.2015%2006>:34, Branko Čibej wrote:
>>>>
>>>>          On 22.03.2015 05 <tel:22.03.2015%2005>:06, Branko Čibej wrote:
>>>>
>>>>              On 21.03.2015 16 <tel:21.03.2015%2016>:23, Branko Čibej
>>>> wrote:
>>>>
>>>>                  On 19.03.2015 11:43, Marc Strapetz wrote:
>>>>
>>>>                      Attached example performs an endless series of
>>>>                      remote status against
>>>>                      the Subversion repository. When invoked with
>>>>                      -Xmx24M, the VM will run
>>>>                      out of memory soon. Monitoring with jvisualvm
>>>>                      shows that the used heap
>>>>                      size constantly grows. Monitoring with the Task
>>>>                      Manager shows that the
>>>>                      allocated memory grows even more (significantly).
>>>>                      Looks like a memory
>>>>                      leak, for which a large amount of native memory is
>>>>                      involved, too.
>>>>
>>>>                      Tested on Windows 8.1 with almost latest
>>>>                      Subversion 1.9 JavaHL builds.
>>>>
>>>>                  I can confirm that this happens on the Mac, too, and
>>>>                  it's not a garbage
>>>>                  collector artefact. I'm trying to trace where the leak
>>>>                  is happening ...
>>>>                  valgrind with APR pool debugging doesn't tell me much
>>>>                  (no surprise there).
>>>>
>>>>              Just to make sure we weren't doing something bad in our
>>>>              libraries, I
>>>>              wrote a small C program that does the same as your Java
>>>>              example (Ev2
>>>>              shims included), and memory usage is completely steady. So
>>>>              it is
>>>>              something in JavaHL, but I have no clue yet what the
>>>>              problem is.
>>>>
>>>>
>>>>          I have to say this was one of the more "interesting" bug-hunts
>>>>          in my not
>>>>          entirely boring career, and that's not really obvious from
>>>> the fix
>>>>          itself. :)
>>>>
>>>>          http://svn.apache.org/r1675771
>>>>
>>>>          Marc: this will not be in RC1, but please give the patch a
>>>>          spin and let
>>>>          me know if it fixes your problem. I tested this with the Java
>>>>          program
>>>>          you attached to your original report, and heap size no longer
>>>>          grows
>>>>          without bounds.
>>>>
>>>>
>>>>      Great hunt, Brane! The native leak seems to be fixed. I've run my
>>>>      remote status loop with -Xmx24M and still get an OOME after ~170
>>>>      loop iterations. The memory leak is significantly smaller and this
>>>>      time it seems to be in the Java part. According to the profiler,
>>>>      most memory is allocated by HashMap and friends, referenced from
>>>>      JNI code. Only two org.apache.subversion classes show up, but I
>>>>      guess they indicate the source of the leak:
>>>>
>>>>      org.apache.subversion.javahl.types.Checksum (~10K instances)
>>>>      org.apache.subversion.javahl.types.NativeInputStream (~10K
>>>> instances)
>>>>
>>>>      Let me know, if you more profiler statistics will be helpful.
>>>>
>>>
>>> So I've been looking at this in depth. At first I thought that one of
>>> the problems was that we didn't release JNI local references; I added
>>> code to make sure this happens in the status callbacks (not committed
>>> yet) and I verified that all the native wrapped objects do get
>>> finalized. However, the Java objects still hang around.
>>>
>>> One of the problems is that all the callbacks happen within the scope of
>>> the ISVNReporter.finishReport call, which means that the whole edit
>>> drive is considered a single JNI call (despite the callbacks to Java)
>>> and the garbage collector can't reclaim space for the objects created
>>> within JNI during that time. But even a forced GC after the report is
>>> done and the remote session disposed won't release all the native
>>> references. I'm a bit stumped here ... JVM's built-in memory profiler
>>> shows the live references and where they're allocated, but doesn't show
>>> why they're not released even when I explicitly create and destroy JNI
>>> frames.
>>
>> Can you please commit your current state somewhere or send me a patch?
>> I can give this one a try in JProfiler and see whether I can gather
>> some more useful information.
>
> Here's the complete patch against 1.9.x. I do see some NativeInputStream
> objects (but not all) being garbage-collected, but there are a number of
> other objects, even those allocated in Java code in the callback, that
> are just hanging around, even if I force a GC. Any help in tracking this
> down will be greatly appreciated.

Thanks, Brane. I'm not sure whether I can confirm this: I took a couple 
of heap dumps while running the remote status 1000 times in a loop and 
my impression was that the amount of almost all objects was basically 
constant, except for Checksum which was constantly rising. The only 
reference to those Checksum objects are "JNI local references". The 
final heap (after 1000 iterations) looks like this:

http://imgur.com/gLdnv84

As you can see, there are no "contents" InputStreams nor "properties" 
(Hash)Maps referenced. AFAIU, I should see almost the same amount of 
objects in all three cases?

-Marc

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 28.04.2015 18:03, Marc Strapetz wrote:
> Hi Brane,
>
> On 28.04.2015 07:36, Branko Čibej wrote:
>> On 24.04.2015 14:11, Branko Čibej wrote:
>>>
>>> Hi Marc,
>>>
>>> Just a quick note: your last msg jogged my memory and I think I know
>>> the root cause of the leak: improper JNI frame management within a
>>> loop. If I'm right, I can both fix the leak and remove the
>>> close-stream requirement I just added.
>>>
>>> On 24 Apr 2015 11:00 am, "Marc Strapetz" <marc.strapetz@syntevo.com
>>> <ma...@syntevo.com>> wrote:
>>>
>>>     On 24.04.2015 06 <tel:24.04.2015%2006>:34, Branko Čibej wrote:
>>>
>>>         On 22.03.2015 05 <tel:22.03.2015%2005>:06, Branko Čibej wrote:
>>>
>>>             On 21.03.2015 16 <tel:21.03.2015%2016>:23, Branko Čibej
>>> wrote:
>>>
>>>                 On 19.03.2015 11:43, Marc Strapetz wrote:
>>>
>>>                     Attached example performs an endless series of
>>>                     remote status against
>>>                     the Subversion repository. When invoked with
>>>                     -Xmx24M, the VM will run
>>>                     out of memory soon. Monitoring with jvisualvm
>>>                     shows that the used heap
>>>                     size constantly grows. Monitoring with the Task
>>>                     Manager shows that the
>>>                     allocated memory grows even more (significantly).
>>>                     Looks like a memory
>>>                     leak, for which a large amount of native memory is
>>>                     involved, too.
>>>
>>>                     Tested on Windows 8.1 with almost latest
>>>                     Subversion 1.9 JavaHL builds.
>>>
>>>                 I can confirm that this happens on the Mac, too, and
>>>                 it's not a garbage
>>>                 collector artefact. I'm trying to trace where the leak
>>>                 is happening ...
>>>                 valgrind with APR pool debugging doesn't tell me much
>>>                 (no surprise there).
>>>
>>>             Just to make sure we weren't doing something bad in our
>>>             libraries, I
>>>             wrote a small C program that does the same as your Java
>>>             example (Ev2
>>>             shims included), and memory usage is completely steady. So
>>>             it is
>>>             something in JavaHL, but I have no clue yet what the
>>>             problem is.
>>>
>>>
>>>         I have to say this was one of the more "interesting" bug-hunts
>>>         in my not
>>>         entirely boring career, and that's not really obvious from
>>> the fix
>>>         itself. :)
>>>
>>>         http://svn.apache.org/r1675771
>>>
>>>         Marc: this will not be in RC1, but please give the patch a
>>>         spin and let
>>>         me know if it fixes your problem. I tested this with the Java
>>>         program
>>>         you attached to your original report, and heap size no longer
>>>         grows
>>>         without bounds.
>>>
>>>
>>>     Great hunt, Brane! The native leak seems to be fixed. I've run my
>>>     remote status loop with -Xmx24M and still get an OOME after ~170
>>>     loop iterations. The memory leak is significantly smaller and this
>>>     time it seems to be in the Java part. According to the profiler,
>>>     most memory is allocated by HashMap and friends, referenced from
>>>     JNI code. Only two org.apache.subversion classes show up, but I
>>>     guess they indicate the source of the leak:
>>>
>>>     org.apache.subversion.javahl.types.Checksum (~10K instances)
>>>     org.apache.subversion.javahl.types.NativeInputStream (~10K
>>> instances)
>>>
>>>     Let me know, if you more profiler statistics will be helpful.
>>>
>>
>> So I've been looking at this in depth. At first I thought that one of
>> the problems was that we didn't release JNI local references; I added
>> code to make sure this happens in the status callbacks (not committed
>> yet) and I verified that all the native wrapped objects do get
>> finalized. However, the Java objects still hang around.
>>
>> One of the problems is that all the callbacks happen within the scope of
>> the ISVNReporter.finishReport call, which means that the whole edit
>> drive is considered a single JNI call (despite the callbacks to Java)
>> and the garbage collector can't reclaim space for the objects created
>> within JNI during that time. But even a forced GC after the report is
>> done and the remote session disposed won't release all the native
>> references. I'm a bit stumped here ... JVM's built-in memory profiler
>> shows the live references and where they're allocated, but doesn't show
>> why they're not released even when I explicitly create and destroy JNI
>> frames.
>
> Can you please commit your current state somewhere or send me a patch?
> I can give this one a try in JProfiler and see whether I can gather
> some more useful information.

Here's the complete patch against 1.9.x. I do see some NativeInputStream
objects (but not all) being garbage-collected, but there are a number of
other objects, even those allocated in Java code in the callback, that
are just hanging around, even if I force a GC. Any help in tracking this
down will be greatly appreciated.

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Marc Strapetz <ma...@syntevo.com>.

Hi Brane,

On 28.04.2015 07:36, Branko Čibej wrote:
> On 24.04.2015 14:11, Branko Čibej wrote:
>>
>> Hi Marc,
>>
>> Just a quick note: your last msg jogged my memory and I think I know
>> the root cause of the leak: improper JNI frame management within a
>> loop. If I'm right, I can both fix the leak and remove the
>> close-stream requirement I just added.
>>
>> On 24 Apr 2015 11:00 am, "Marc Strapetz" <marc.strapetz@syntevo.com
>> <ma...@syntevo.com>> wrote:
>>
>>     On 24.04.2015 06 <tel:24.04.2015%2006>:34, Branko Čibej wrote:
>>
>>         On 22.03.2015 05 <tel:22.03.2015%2005>:06, Branko Čibej wrote:
>>
>>             On 21.03.2015 16 <tel:21.03.2015%2016>:23, Branko Čibej wrote:
>>
>>                 On 19.03.2015 11:43, Marc Strapetz wrote:
>>
>>                     Attached example performs an endless series of
>>                     remote status against
>>                     the Subversion repository. When invoked with
>>                     -Xmx24M, the VM will run
>>                     out of memory soon. Monitoring with jvisualvm
>>                     shows that the used heap
>>                     size constantly grows. Monitoring with the Task
>>                     Manager shows that the
>>                     allocated memory grows even more (significantly).
>>                     Looks like a memory
>>                     leak, for which a large amount of native memory is
>>                     involved, too.
>>
>>                     Tested on Windows 8.1 with almost latest
>>                     Subversion 1.9 JavaHL builds.
>>
>>                 I can confirm that this happens on the Mac, too, and
>>                 it's not a garbage
>>                 collector artefact. I'm trying to trace where the leak
>>                 is happening ...
>>                 valgrind with APR pool debugging doesn't tell me much
>>                 (no surprise there).
>>
>>             Just to make sure we weren't doing something bad in our
>>             libraries, I
>>             wrote a small C program that does the same as your Java
>>             example (Ev2
>>             shims included), and memory usage is completely steady. So
>>             it is
>>             something in JavaHL, but I have no clue yet what the
>>             problem is.
>>
>>
>>         I have to say this was one of the more "interesting" bug-hunts
>>         in my not
>>         entirely boring career, and that's not really obvious from the fix
>>         itself. :)
>>
>>         http://svn.apache.org/r1675771
>>
>>         Marc: this will not be in RC1, but please give the patch a
>>         spin and let
>>         me know if it fixes your problem. I tested this with the Java
>>         program
>>         you attached to your original report, and heap size no longer
>>         grows
>>         without bounds.
>>
>>
>>     Great hunt, Brane! The native leak seems to be fixed. I've run my
>>     remote status loop with -Xmx24M and still get an OOME after ~170
>>     loop iterations. The memory leak is significantly smaller and this
>>     time it seems to be in the Java part. According to the profiler,
>>     most memory is allocated by HashMap and friends, referenced from
>>     JNI code. Only two org.apache.subversion classes show up, but I
>>     guess they indicate the source of the leak:
>>
>>     org.apache.subversion.javahl.types.Checksum (~10K instances)
>>     org.apache.subversion.javahl.types.NativeInputStream (~10K instances)
>>
>>     Let me know, if you more profiler statistics will be helpful.
>>
>
> So I've been looking at this in depth. At first I thought that one of
> the problems was that we didn't release JNI local references; I added
> code to make sure this happens in the status callbacks (not committed
> yet) and I verified that all the native wrapped objects do get
> finalized. However, the Java objects still hang around.
>
> One of the problems is that all the callbacks happen within the scope of
> the ISVNReporter.finishReport call, which means that the whole edit
> drive is considered a single JNI call (despite the callbacks to Java)
> and the garbage collector can't reclaim space for the objects created
> within JNI during that time. But even a forced GC after the report is
> done and the remote session disposed won't release all the native
> references. I'm a bit stumped here ... JVM's built-in memory profiler
> shows the live references and where they're allocated, but doesn't show
> why they're not released even when I explicitly create and destroy JNI
> frames.

Can you please commit your current state somewhere or send me a patch? I 
can give this one a try in JProfiler and see whether I can gather some 
more useful information.

-Marc

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 24.04.2015 14:11, Branko Čibej wrote:
>
> Hi Marc,
>
> Just a quick note: your last msg jogged my memory and I think I know
> the root cause of the leak: improper JNI frame management within a
> loop. If I'm right, I can both fix the leak and remove the
> close-stream requirement I just added.
>
> On 24 Apr 2015 11:00 am, "Marc Strapetz" <marc.strapetz@syntevo.com
> <ma...@syntevo.com>> wrote:
>
>     On 24.04.2015 06 <tel:24.04.2015%2006>:34, Branko Čibej wrote:
>
>         On 22.03.2015 05 <tel:22.03.2015%2005>:06, Branko Čibej wrote:
>
>             On 21.03.2015 16 <tel:21.03.2015%2016>:23, Branko Čibej wrote:
>
>                 On 19.03.2015 11:43, Marc Strapetz wrote:
>
>                     Attached example performs an endless series of
>                     remote status against
>                     the Subversion repository. When invoked with
>                     -Xmx24M, the VM will run
>                     out of memory soon. Monitoring with jvisualvm
>                     shows that the used heap
>                     size constantly grows. Monitoring with the Task
>                     Manager shows that the
>                     allocated memory grows even more (significantly).
>                     Looks like a memory
>                     leak, for which a large amount of native memory is
>                     involved, too.
>
>                     Tested on Windows 8.1 with almost latest
>                     Subversion 1.9 JavaHL builds.
>
>                 I can confirm that this happens on the Mac, too, and
>                 it's not a garbage
>                 collector artefact. I'm trying to trace where the leak
>                 is happening ...
>                 valgrind with APR pool debugging doesn't tell me much
>                 (no surprise there).
>
>             Just to make sure we weren't doing something bad in our
>             libraries, I
>             wrote a small C program that does the same as your Java
>             example (Ev2
>             shims included), and memory usage is completely steady. So
>             it is
>             something in JavaHL, but I have no clue yet what the
>             problem is.
>
>
>         I have to say this was one of the more "interesting" bug-hunts
>         in my not
>         entirely boring career, and that's not really obvious from the fix
>         itself. :)
>
>         http://svn.apache.org/r1675771
>
>         Marc: this will not be in RC1, but please give the patch a
>         spin and let
>         me know if it fixes your problem. I tested this with the Java
>         program
>         you attached to your original report, and heap size no longer
>         grows
>         without bounds.
>
>
>     Great hunt, Brane! The native leak seems to be fixed. I've run my
>     remote status loop with -Xmx24M and still get an OOME after ~170
>     loop iterations. The memory leak is significantly smaller and this
>     time it seems to be in the Java part. According to the profiler,
>     most memory is allocated by HashMap and friends, referenced from
>     JNI code. Only two org.apache.subversion classes show up, but I
>     guess they indicate the source of the leak:
>
>     org.apache.subversion.javahl.types.Checksum (~10K instances)
>     org.apache.subversion.javahl.types.NativeInputStream (~10K instances)
>
>     Let me know, if you more profiler statistics will be helpful.
>

So I've been looking at this in depth. At first I thought that one of
the problems was that we didn't release JNI local references; I added
code to make sure this happens in the status callbacks (not committed
yet) and I verified that all the native wrapped objects do get
finalized. However, the Java objects still hang around.

One of the problems is that all the callbacks happen within the scope of
the ISVNReporter.finishReport call, which means that the whole edit
drive is considered a single JNI call (despite the callbacks to Java)
and the garbage collector can't reclaim space for the objects created
within JNI during that time. But even a forced GC after the report is
done and the remote session disposed won't release all the native
references. I'm a bit stumped here ... JVM's built-in memory profiler
shows the live references and where they're allocated, but doesn't show
why they're not released even when I explicitly create and destroy JNI
frames.

Did I say that debugging JNI memory management is a pain?

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

Hi Marc,

Just a quick note: your last msg jogged my memory and I think I know the
root cause of the leak: improper JNI frame management within a loop. If I'm
right, I can both fix the leak and remove the close-stream requirement I
just added.
On 24 Apr 2015 11:00 am, "Marc Strapetz" <ma...@syntevo.com> wrote:

> On 24.04.2015 06:34, Branko Čibej wrote:
>
>> On 22.03.2015 05:06, Branko Čibej wrote:
>>
>>> On 21.03.2015 16:23, Branko Čibej wrote:
>>>
>>>> On 19.03.2015 11:43, Marc Strapetz wrote:
>>>>
>>>>> Attached example performs an endless series of remote status against
>>>>> the Subversion repository. When invoked with -Xmx24M, the VM will run
>>>>> out of memory soon. Monitoring with jvisualvm shows that the used heap
>>>>> size constantly grows. Monitoring with the Task Manager shows that the
>>>>> allocated memory grows even more (significantly). Looks like a memory
>>>>> leak, for which a large amount of native memory is involved, too.
>>>>>
>>>>> Tested on Windows 8.1 with almost latest Subversion 1.9 JavaHL builds.
>>>>>
>>>> I can confirm that this happens on the Mac, too, and it's not a garbage
>>>> collector artefact. I'm trying to trace where the leak is happening ...
>>>> valgrind with APR pool debugging doesn't tell me much (no surprise
>>>> there).
>>>>
>>> Just to make sure we weren't doing something bad in our libraries, I
>>> wrote a small C program that does the same as your Java example (Ev2
>>> shims included), and memory usage is completely steady. So it is
>>> something in JavaHL, but I have no clue yet what the problem is.
>>>
>>
>> I have to say this was one of the more "interesting" bug-hunts in my not
>> entirely boring career, and that's not really obvious from the fix
>> itself. :)
>>
>> http://svn.apache.org/r1675771
>>
>> Marc: this will not be in RC1, but please give the patch a spin and let
>> me know if it fixes your problem. I tested this with the Java program
>> you attached to your original report, and heap size no longer grows
>> without bounds.
>>
>
> Great hunt, Brane! The native leak seems to be fixed. I've run my remote
> status loop with -Xmx24M and still get an OOME after ~170 loop iterations.
> The memory leak is significantly smaller and this time it seems to be in
> the Java part. According to the profiler, most memory is allocated by
> HashMap and friends, referenced from JNI code. Only two
> org.apache.subversion classes show up, but I guess they indicate the source
> of the leak:
>
> org.apache.subversion.javahl.types.Checksum (~10K instances)
> org.apache.subversion.javahl.types.NativeInputStream (~10K instances)
>
> Let me know, if you more profiler statistics will be helpful.
>
> -Marc
>

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Marc Strapetz <ma...@syntevo.com>.

On 24.04.2015 06:34, Branko Čibej wrote:
> On 22.03.2015 05:06, Branko Čibej wrote:
>> On 21.03.2015 16:23, Branko Čibej wrote:
>>> On 19.03.2015 11:43, Marc Strapetz wrote:
>>>> Attached example performs an endless series of remote status against
>>>> the Subversion repository. When invoked with -Xmx24M, the VM will run
>>>> out of memory soon. Monitoring with jvisualvm shows that the used heap
>>>> size constantly grows. Monitoring with the Task Manager shows that the
>>>> allocated memory grows even more (significantly). Looks like a memory
>>>> leak, for which a large amount of native memory is involved, too.
>>>>
>>>> Tested on Windows 8.1 with almost latest Subversion 1.9 JavaHL builds.
>>> I can confirm that this happens on the Mac, too, and it's not a garbage
>>> collector artefact. I'm trying to trace where the leak is happening ...
>>> valgrind with APR pool debugging doesn't tell me much (no surprise there).
>> Just to make sure we weren't doing something bad in our libraries, I
>> wrote a small C program that does the same as your Java example (Ev2
>> shims included), and memory usage is completely steady. So it is
>> something in JavaHL, but I have no clue yet what the problem is.
>
> I have to say this was one of the more "interesting" bug-hunts in my not
> entirely boring career, and that's not really obvious from the fix
> itself. :)
>
> http://svn.apache.org/r1675771
>
> Marc: this will not be in RC1, but please give the patch a spin and let
> me know if it fixes your problem. I tested this with the Java program
> you attached to your original report, and heap size no longer grows
> without bounds.

Great hunt, Brane! The native leak seems to be fixed. I've run my remote 
status loop with -Xmx24M and still get an OOME after ~170 loop 
iterations. The memory leak is significantly smaller and this time it 
seems to be in the Java part. According to the profiler, most memory is 
allocated by HashMap and friends, referenced from JNI code. Only two 
org.apache.subversion classes show up, but I guess they indicate the 
source of the leak:

org.apache.subversion.javahl.types.Checksum (~10K instances)
org.apache.subversion.javahl.types.NativeInputStream (~10K instances)

Let me know, if you more profiler statistics will be helpful.

-Marc

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Julian Foad <ju...@btopenworld.com>.

Branko Čibej wrote:
> There were two options for a fix:
>
>     dispose of the object from the native code when it goes out of scope; or,
>     close the stream (causing an implicit disposal) from Java code.
>
> Picking the former would mean that the Java code must not close the stream,
> since doing so would result in a crash. Picking the latter means that
> forgetting to close the stream results in a memory leak. Faced with this
> dilemma, I decided that a potential memory leak with documented workaround
> is the less horrible option.
>
> Of course it's clear that memory management in JavaHL needs a serious
> overhaul, but this is not the time. Also, at some later date we may be able
> to remove the requirement to close the stream whilst making closing it
> always safe. Given the current JavaHL RA API, the only (Java) editor
> implementation likely to exist for a while is the one within JavaHL itself
> that's hidden under the hood of the status receiver.

Thanks, Brane. While there is clearly some interesting longer term
work to do, and while I don't completely follow all the details, that
helps understand the current situation.

- Julian

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 24.04.2015 10:15, Branko Čibej wrote:
> There were two options for a fix:
>
>   * dispose of the object from the native code when it goes out of
>     scope; or,
>   * close the stream (causing an implicit disposal) from Java code.
>
> Picking the former would mean that the Java code must not close the
> stream, since doing so would result in a crash. Picking the latter
> means that forgetting to close the stream results in a memory leak.
> Faced with this dilemma, I decided that a potential memory leak with
> documented workaround is the less horrible option.
>

Of course it's clear that memory management in JavaHL needs a serious
overhaul, but this is not the time. Also, at some later date we may be
able to remove the requirement to close the stream whilst making closing
it always safe. Given the current JavaHL RA API, the only (Java) editor
implementation likely to exist for a while is the one within JavaHL
itself that's hidden under the hood of the status receiver.

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 24.04.2015 09:56, Julian Foad wrote:
> Hi Brane.
>
> I'm sure you would have first looked for ways to make this happen
> automatically or not be needed. Is there anything you can report on
> that front, for future reference?

You mean, apart from in the log message?

The root of the "problem" lies in the way JavaHL wraps native objects
within Java objects, by storing a pointer to the underlying native C++
object in a private variable of the Java object. Since Java doesn't have
destructors and we can't reliably use the finalizer to clean up the
native object, we invent a dispose() method that Java code must call
when it's done using wrapper. This worked moderately well up to now.

In the case of the status editor, the native code itself creates a Java
stream to pass to the editor implementation and did not call the dispose
method. Hence, the native object was never deallocated and heap usage
kept growing. (Even though there's theoretically a mechanism in JavaHL
to detect this, it's not always reliable, due to the way the JVM garbage
collector works.)

There were two options for a fix:

  * dispose of the object from the native code when it goes out of
    scope; or,
  * close the stream (causing an implicit disposal) from Java code.

Picking the former would mean that the Java code must not close the
stream, since doing so would result in a crash. Picking the latter means
that forgetting to close the stream results in a memory leak. Faced with
this dilemma, I decided that a potential memory leak with documented
workaround is the less horrible option.

FWIW, I only noticed this because the wrapped native object contains a
pool that was never destroyed, and analyzing APR pool debug logs in
combination with some stack trace generating magic finally pointed at
the actual line of code that was the culprit. Without that, finding the
missing 'delete' would be that much harder. Running the Java test
program though Valgrind, for example, was an exercise in futility ...
Valgrind never came close to noticing the leak, even with APR pool
debugging enabled.

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Julian Foad <ju...@btopenworld.com>.

Hi Brane.

I'm sure you would have first looked for ways to make this happen
automatically or not be needed. Is there anything you can report on
that front, for future reference?

- Julian

Branko Čibej wrote:
> I have to say this was one of the more "interesting" bug-hunts in my not
> entirely boring career, and that's not really obvious from the fix
> itself. :)
>
> http://svn.apache.org/r1675771

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 22.03.2015 05:06, Branko Čibej wrote:
> On 21.03.2015 16:23, Branko Čibej wrote:
>> On 19.03.2015 11:43, Marc Strapetz wrote:
>>> Attached example performs an endless series of remote status against
>>> the Subversion repository. When invoked with -Xmx24M, the VM will run
>>> out of memory soon. Monitoring with jvisualvm shows that the used heap
>>> size constantly grows. Monitoring with the Task Manager shows that the
>>> allocated memory grows even more (significantly). Looks like a memory
>>> leak, for which a large amount of native memory is involved, too.
>>>
>>> Tested on Windows 8.1 with almost latest Subversion 1.9 JavaHL builds.
>> I can confirm that this happens on the Mac, too, and it's not a garbage
>> collector artefact. I'm trying to trace where the leak is happening ...
>> valgrind with APR pool debugging doesn't tell me much (no surprise there).
> Just to make sure we weren't doing something bad in our libraries, I
> wrote a small C program that does the same as your Java example (Ev2
> shims included), and memory usage is completely steady. So it is
> something in JavaHL, but I have no clue yet what the problem is.

I have to say this was one of the more "interesting" bug-hunts in my not
entirely boring career, and that's not really obvious from the fix
itself. :)

http://svn.apache.org/r1675771

Marc: this will not be in RC1, but please give the patch a spin and let
me know if it fixes your problem. I tested this with the Java program
you attached to your original report, and heap size no longer grows
without bounds.

I'll make sure we port this to 1.9.x in time for GA.

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 21.03.2015 16:23, Branko Čibej wrote:
> On 19.03.2015 11:43, Marc Strapetz wrote:
>> Attached example performs an endless series of remote status against
>> the Subversion repository. When invoked with -Xmx24M, the VM will run
>> out of memory soon. Monitoring with jvisualvm shows that the used heap
>> size constantly grows. Monitoring with the Task Manager shows that the
>> allocated memory grows even more (significantly). Looks like a memory
>> leak, for which a large amount of native memory is involved, too.
>>
>> Tested on Windows 8.1 with almost latest Subversion 1.9 JavaHL builds.
> I can confirm that this happens on the Mac, too, and it's not a garbage
> collector artefact. I'm trying to trace where the leak is happening ...
> valgrind with APR pool debugging doesn't tell me much (no surprise there).

Just to make sure we weren't doing something bad in our libraries, I
wrote a small C program that does the same as your Java example (Ev2
shims included), and memory usage is completely steady. So it is
something in JavaHL, but I have no clue yet what the problem is.

-- Brane

Re: 1.9 JavaHL memory leak in ISVNRemote#status

Posted by Branko Čibej <br...@wandisco.com>.

On 19.03.2015 11:43, Marc Strapetz wrote:
> Attached example performs an endless series of remote status against
> the Subversion repository. When invoked with -Xmx24M, the VM will run
> out of memory soon. Monitoring with jvisualvm shows that the used heap
> size constantly grows. Monitoring with the Task Manager shows that the
> allocated memory grows even more (significantly). Looks like a memory
> leak, for which a large amount of native memory is involved, too.
>
> Tested on Windows 8.1 with almost latest Subversion 1.9 JavaHL builds.

I can confirm that this happens on the Mac, too, and it's not a garbage
collector artefact. I'm trying to trace where the leak is happening ...
valgrind with APR pool debugging doesn't tell me much (no surprise there).

-- Brane