You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tvm.apache.org by 雾雨魔理沙 <no...@github.com> on 2019/06/23 22:25:42 UTC

[dmlc/tvm] [Relay][RFC] Garbage Collection (#3423)

right now relay aot/interpreter/vm all use reference counting to collect memory. However, it is possible to create cyclic data structure in relay, as demonstrated below:

data Loop = Ref (Optional Loop)
x : Loop = Ref None
match x with
|  r -> r := x
end

in short, we can has a data type holding a mutable (nullable) reference, initialize it to null, then point it to itself.

This example is purely contrived, but it is completely possible in real, meaningful relay code: imagine a doubly linked list, or two closure referencing each other. They all form cyclic link data which will never be collected by reference counting.

there are three problem we should discuss:

0: what algorithm should we use? mark and sweep/generational, etc?

1: should we still use reference counting?

2: how can we implement the runtime only once, for aot/interpreter/vm, instead of each of them having to implement the runtime itself?

maybe we can look at https://github.com/hsutter/gcpp?

@jroesch @tqchen @icemelon9 @wweic @junrushao1994 @nhynes any suggestion?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3423

Re: [dmlc/tvm] [Relay][RFC] Garbage Collection (#3423)

Posted by 雾雨魔理沙 <no...@github.com>.

@ajtulloch indeed. However, it will always be possible that ppl might uncarefully create strong reference loop, and be unable to collect. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3423#issuecomment-505184458

Re: [dmlc/tvm] [Relay][RFC] Garbage Collection (#3423)

Posted by hlu1 <no...@github.com>.

@MarisaKirisame, https://github.com/dmlc/tvm/pull/3448 was the leak @ajtulloch was referring to. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3423#issuecomment-506540526

Re: [dmlc/tvm] [Relay][RFC] Garbage Collection (#3423)

Posted by Tianqi Chen <no...@github.com>.

Seems we agreed that weak reference was better than gc. close this RFC thread for now. Thanks everyone who participated in the discussion

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3423#issuecomment-508954422

Re: [dmlc/tvm] [Relay][RFC] Garbage Collection (#3423)

Posted by Wei Chen <no...@github.com>.

Thanks for bringing up this topic. We might need to consider the scenarios we want to use relay and design under those constraints. For now the main use cases is inference(maybe training in the future). I guess most of the objects allocated during inference will be `TensorObject`, plus a few `DatatypeObject`, we can run some benchmarks for major models to confirm. `TensorObject` shouldn't cause cyclic reference, and if it's the majority of the objects, we can handle them with existing reference counting. And for the remaining `DatatypeObject`, a mark sweep GC can be called infrequently. But **benchmark first**. :-)

Agree sharing GC. For AOT, should we learn how rust get rid of GC as a further optimization? 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3423#issuecomment-504807018

Re: [dmlc/tvm] [Relay][RFC] Garbage Collection (#3423)

Posted by Andrew Tulloch <no...@github.com>.

cc @hlu1 re: the leak from e.g. recursive functions taking a reference to itself in the environment.

FWIW, can we solve these via adding the concept of weak references to the node system? It seems like in these cases that closures could use weak references to other closures, and have the higher-level env maintain the strong reference to the closure objects themselves?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3423#issuecomment-505173935

Re: [dmlc/tvm] [Relay][RFC] Garbage Collection (#3423)

Posted by Tianqi Chen <no...@github.com>.

Closed #3423.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3423#event-2464108976

Re: [dmlc/tvm] [Relay][RFC] Garbage Collection (#3423)

Posted by Tianqi Chen <no...@github.com>.

Personally, I am not in favor of introducing GC to the system. Reference counting was fine and is great for handling external resources(GPU memories etc). Exact for the same reason, languages like java/scala had a pretty bad time working with GPU memories(memories not being de-allocated precisely at the point where data structure goes out of scope).

Ref counting has its limitation, but we can be mindful to avoid cycles and it works great for most of the cases we are looking at, without introducing the additional problem of GC

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3423#issuecomment-504839026

Re: [dmlc/tvm] [Relay][RFC] Garbage Collection (#3423)

Posted by Jared Roesch <no...@github.com>.

I talked with Zach DeVito from PyTorch team for a while about RefCounting, there are quite a few benefits to using reference counting. We should probably just use weak refs, solutions from Counting Immutable Beans (a recent paper by my MSR collaborator where they do much better than GC languages in perf) or cycle detectors if we are worried about cycles. I think generally RC provides the best tradeoffs for ML right now.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3423#issuecomment-507440038