You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@thrift.apache.org by "Mithun Radhakrishnan (Created) (JIRA)" <ji...@apache.org> on 2011/12/16 09:10:30 UTC

[jira] [Created] (THRIFT-1468) Memory leak in TSaslServerTransport

Memory leak in TSaslServerTransport
-----------------------------------

Key: THRIFT-1468
URL: https://issues.apache.org/jira/browse/THRIFT-1468
Project: Thrift
Issue Type: Bug
Components: Java - Library
Affects Versions: 0.5, 0.9
Reporter: Mithun Radhakrishnan

I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)

We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.

I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).

>From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.

>From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.

(I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)

The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170825#comment-13170825 ] 

Mithun Radhakrishnan commented on THRIFT-1468:
----------------------------------------------

HCATALOG-183 depends on this. If this is fixed, it would be great to have the fix available for 0.5 and 0.7. 

Thanks.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Aaron T. Myers (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174972#comment-13174972 ] 

Aaron T. Myers commented on THRIFT-1468:
----------------------------------------

bq. I can confirm that the newest patch works. (This patch also has the comments you'd requested for.) Ran overnight, and the latest heap-dumps don't indicate a build-up of WeakHashMap$Entry objects.

That's great! Thanks a lot for testing that out, Mithun. The patch and the comment you added look good to me.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, THRIFT-1468-Memory_leak_in_TSaslServerTransport.patch, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mithun Radhakrishnan updated THRIFT-1468:
-----------------------------------------

    Attachment: Main.java

Uploading test-code that illustrates how the WeakHashMap might not be working. (The proposed fix is also in the file.)
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171057#comment-13171057 ] 

Todd Lipcon commented on THRIFT-1468:
-------------------------------------

By chance are you using the CMS GC? There was a bug recently in CMS where some types of references would fail to be collected. You can try -XX:-CMSConcurrentMTEnabled and see if it fixes the issue
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174923#comment-13174923 ] 

Mithun Radhakrishnan commented on THRIFT-1468:
----------------------------------------------

I can confirm that the newest patch works. (This patch also has the comments you'd requested for.) Ran overnight, and the latest heap-dumps don't indicate a build-up of WeakHashMap$Entry objects.

Would it be possible to get this patch into a build of 0.5.0 (on a maven-repository), and not just 0.9? That'd let Hive and HCatalog take advantage of this fix.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, THRIFT-1468-Memory_leak_in_TSaslServerTransport.patch, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Bryan Duxbury (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174301#comment-13174301 ] 

Bryan Duxbury commented on THRIFT-1468:
---------------------------------------

@Aaron - I can see how your explanation handles the race condition, but it might be nice to put a little comment in the code to clarify that, since it's likely to be a mistake people make again.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172645#comment-13172645 ] 

Todd Lipcon commented on THRIFT-1468:
-------------------------------------

I see now what you're saying. Sorry for the red herring JVM bug.

Rather than introducing the weakreference in TSaslServerTransport, I think it would make more sense to have the map use WeakReference<TSaslServerTransport> as values -- that keeps the TSaslTransport implementation from having to work around the implementation details of the factory
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Bryan Duxbury (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174930#comment-13174930 ] 

Bryan Duxbury commented on THRIFT-1468:
---------------------------------------

Hm, I'm not too excited at the prospect of releasing a new 0.5 with this fix. If the patch applies cleanly to 0.5, it might be possible, but I'd much rather encourage projects to move forward in versions instead.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, THRIFT-1468-Memory_leak_in_TSaslServerTransport.patch, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172621#comment-13172621 ] 

Mithun Radhakrishnan commented on THRIFT-1468:
----------------------------------------------

@Bryan, @Todd: I'm sorry to say that the twiddling the GC options does not fix the issue. I'm not completely done with the testing, but so far, it doesn't look like this can be worked around. (Thank you for the suggestion, though.)

Please consider the implementation note in the WeakHashMap's javadoc:
http://docs.oracle.com/javase/6/docs/api/java/util/WeakHashMap.html
"... care should be taken to ensure that value objects do not strongly refer to their own keys, either directly or indirectly, since that will prevent the keys from being discarded. "

This is precisely the condition we're facing. Introducing the WeakReference would fix this. (My tests are still running.)
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172783#comment-13172783 ] 

Mithun Radhakrishnan commented on THRIFT-1468:
----------------------------------------------

Right, I agree. That's cleaner. (Sorry. It didn't occur to me till this morning.)
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mithun Radhakrishnan updated THRIFT-1468:
-----------------------------------------

    Attachment: THRIFT-1468-Memory_leak_in_TSaslServerTransport.patch

Uploading new patch (with TSaslServerTransport's transportMap holding values as weak-references).

Tests are ongoing. I'll post results.

In the meantime, Aaron, could I please enquire where the return-value from getTransport() is cached? Were you referring to TThreadPoolServer and TSimpleServer?
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, THRIFT-1468-Memory_leak_in_TSaslServerTransport.patch, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mithun Radhakrishnan updated THRIFT-1468:
-----------------------------------------

    Attachment: thrift-1468.patch

Uploading the patch with the proposed fix. (Changing the back-reference to a WeakReference.)
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174966#comment-13174966 ] 

Mithun Radhakrishnan commented on THRIFT-1468:
----------------------------------------------

@Bryan: Hey, I had to try. ;] Thank you for your candour.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, THRIFT-1468-Memory_leak_in_TSaslServerTransport.patch, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Aaron T. Myers (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173896#comment-13173896 ] 

Aaron T. Myers commented on THRIFT-1468:
----------------------------------------

bq. On second thoughts, I'm not sure if that's ideal. Remember: we need the WeakHashMap because client-code (of the factory) caches the key, i.e. the underlying TTransport object, and not the value (returned from the factory).

I haven't looked at the code in a while, but I'm pretty sure that's not the case. The reason for the hash map is because the TServer implementations don't reuse the same TTransport instance for input and output, even when the same TTransportFactory instance is used for both the input and output factory. This is a requirement for the TSaslTransports. Once TSaslServerTransport#Factory#getTransport is called once, the result is indeed cached by the client-code of the factory - the TServer implementations. So, there will be a hard reference to the value, and that will in turn have a hard reference to the key.

I agree with Todd that storing a WeakReference<TSaslServerTransport> should solve the problem. Mithun, would you mind testing this out?
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175055#comment-13175055 ] 

Hudson commented on THRIFT-1468:
--------------------------------

Integrated in Thrift #372 (See [https://builds.apache.org/job/Thrift/372/])
    THRIFT-1468. java: Memory leak in TSaslServerTransport

This patch changes a particular map element to a WeakReference and thus avoids some awkward un-GC-ableness.

Patch: Mithun Radhakrishnan"

bryanduxbury : http://svn.apache.org/viewvc/?view=rev&rev=1222397
Files : 
* /thrift/trunk/lib/java/src/org/apache/thrift/transport/TSaslServerTransport.java

                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>             Fix For: 0.9
>
>         Attachments: Main.java, THRIFT-1468-Memory_leak_in_TSaslServerTransport.patch, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Bryan Duxbury (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172405#comment-13172405 ] 

Bryan Duxbury commented on THRIFT-1468:
---------------------------------------

I'd love to hear if that switch Todd suggests fixes the problem.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173910#comment-13173910 ] 

Ashutosh Chauhan commented on THRIFT-1468:
------------------------------------------

On a semi-related note, it appears to me that there is a race condition in TSaslServerTransport::getTransport()
{code}
TSaslServerTransport ret = transportMap.get(base);
      if (ret == null) {
        LOGGER.debug("transport map does not contain key", base);
        ret = new TSaslServerTransport(serverDefinitionMap, base);
        try {
          ret.open();
        } catch (TTransportException e) {
        }
        transportMap.put(base, ret);
      } else {
        LOGGER.debug("transport map does contain key {}", base);
      }
      return ret;
{code}
transport map is synchronized, but I think putIfAbsent() semantics is needed here instead of null checking and then adding.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173728#comment-13173728 ] 

Mithun Radhakrishnan commented on THRIFT-1468:
----------------------------------------------

On second thoughts, I'm not sure if that's ideal. Remember: we need the WeakHashMap because client-code (of the factory) caches the key, i.e. the underlying TTransport object, and not the value (returned from the factory).

So if we make the value a WeakReference, the TSaslTransport might be aggressively collected *before* the key has been collected. The next time the factory is invoked, a fresh TSaslTransport would have to be set up. It'll work, but there is a perf-hit.

My way, there's no way the TSaslTransport can be collected before the underlying TTransport is collected. (That's because the users of the factory already have a handle to the TTransport. If it didn't, we wouldn't even be using the cache.) The safety checks I've put in TSaslTransport are "just in case".

I can confirm that this fix does fix the mem-leak. It would be good if the patch could please be reviewed for correctness.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Bryan Duxbury (Closed) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury closed THRIFT-1468.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9
         Assignee: Mithun Radhakrishnan

I just committed this.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>             Fix For: 0.9
>
>         Attachments: Main.java, THRIFT-1468-Memory_leak_in_TSaslServerTransport.patch, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Mithun Radhakrishnan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173912#comment-13173912 ] 

Mithun Radhakrishnan commented on THRIFT-1468:
----------------------------------------------

Thank you for clarifying, Aaron. I might have misread the code that uses TSST::Factory::getTransport(). I could try out changing to a WeakReference, and post results shortly.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Aaron T. Myers (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173913#comment-13173913 ] 

Aaron T. Myers commented on THRIFT-1468:
----------------------------------------

bq. On a semi-related note, it appears to me that there is a race condition in TSaslServerTransport::getTransport()

I disagree. Again, the only reason for the map is to make sure that a single given TServer instance gets the same TSaslServerTransport instance for the input and output transports on successive calls to TSaslServerTransport#Factory#getTransport. Thus, we can conclude that concurrent callers of getTransport() will always be passing different base transports.

See THRIFT-876 for most of the history on the implementation of the TSasl* classes.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1468) Memory leak in TSaslServerTransport

Posted by "Aaron T. Myers (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174332#comment-13174332 ] 

Aaron T. Myers commented on THRIFT-1468:
----------------------------------------

@Bryan - I totally agree. Mithun, would you mind adding a comment to that affect to whatever patch you post for this issue? If not, I'll file a separate JIRA to add the comment.
                
> Memory leak in TSaslServerTransport
> -----------------------------------
>
>                 Key: THRIFT-1468
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1468
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.5, 0.9
>            Reporter: Mithun Radhakrishnan
>              Labels: OOM, WeakHashMap, WeakReference
>         Attachments: Main.java, thrift-1468.patch
>
>
> I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
> We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
> I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
> From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
> 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
> 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
> From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
> (I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
> The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira