You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org> on 2009/11/18 14:24:39 UTC

[jira] Commented: (AVRO-210) Memory leak with recursive schemas when constructed by hand

    [ https://issues.apache.org/jira/browse/AVRO-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779440#action_12779440 ] 

Thiruvalluvan M. G. commented on AVRO-210:
------------------------------------------

Any scheme that uses only reference counting will cause a leak in case of circular references. One partial solution for this is to use Boost SharedPtr and WeakPtr. There are two kinds of references between nodes in a schema - parent to child reference and symbolic references. We can use SharedPtr to refer to children in parents and use WeakPtr for symbolic references. This will not have cycles and no leaks.

But this solution has one problem in multi-threaded situations. If a thread holds an intermediate node n1 in a temporary (say during a schema walk) and another thread deletes the "root" node, all nodes that are ancestors of n1 will get cleared. But one of these cleared nodes could be referred through a weak pointer by one of the children of n1. Then that weak pointer will become invalid. So the thread that is doing a schema walk will not get the whole picture.

I suppose this will not be a big problem and we can live with it.

If there are no big objections to this approach, I'll submit a patch.

> Memory leak with recursive schemas when constructed by hand
> -----------------------------------------------------------
>
>                 Key: AVRO-210
>                 URL: https://issues.apache.org/jira/browse/AVRO-210
>             Project: Avro
>          Issue Type: Bug
>          Components: c++
>            Reporter: Thiruvalluvan M. G.
>
> Schema consists of a node or bunch of nodes. These nodes are represented as intrusive pointers of nodes (NodPtr). Since the intrusive pointers use reference counts, recursive schemas which result in cycles of intrusive pointers lead to memory leak. The following code, when compiled and run, causes the memory to grow steadily:
> {code:title=test.cc|borderStyle=solid}
> #include <unistd.h>
> #include "Schema.hh"
> int main(int argc, char** argv)
> {
>     const int count1 = 10;
>     const int count2 = 1000;
>     for (int i = 0; i < count1; i++) {
>         for (int j = 0; j < count2; j++) {
>             avro::RecordSchema rec("LongList");
>             rec.addField("value", avro::LongSchema());
>             avro::UnionSchema next;
>             next.addType(avro::NullSchema());
>             next.addType(rec);
>             rec.addField("next", next);
>             rec.addField("end", avro::BoolSchema());
>         }
>         sleep(1);
>     }
> }
> {code}
> The leak should not happen when we build the schema by parsing a JSON schema file. This is because the current implementation does not use pointers for symbolic links; it uses symbols and there is a symbol table that resolves the symbols at runtime. But unfortunately the nested schema file generates an error. I'll file a separate JIRA for that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.