You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/04/21 23:24:21 UTC

[GitHub] [tvm] petersalas opened a new pull request, #11094: [CONTRIB] Add PopenWorker process recycling

petersalas opened a new pull request, #11094:
URL: https://github.com/apache/tvm/pull/11094

   Background processes (e.g. builds during autotuning) can sometimes accumulate state or leak memory. This adds the ability for `PopenWorker` processes to optionally be recycled after a specified number of uses.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] AndrewZhaoLuo commented on pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
AndrewZhaoLuo commented on PR #11094:
URL: https://github.com/apache/tvm/pull/11094#issuecomment-1106021581

   Do we have an existing issue for the memory leak stuff? I think based on some internal data we suspect it's in `tvm.lower` somewhere. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] shingjan commented on a diff in pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
shingjan commented on code in PR #11094:
URL: https://github.com/apache/tvm/pull/11094#discussion_r855746519


##########
python/tvm/contrib/popen_pool.py:
##########
@@ -92,12 +92,18 @@ class PopenWorker:
 
     initargs: Tuple[object]
         A tuple of args for the initializer
+
+    maximum_uses: int or None

Review Comment:
   I think we can do Optional[int] here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] junrushao1994 commented on pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on PR #11094:
URL: https://github.com/apache/tvm/pull/11094#issuecomment-1106788665

   LGTM too! BTW, if `maximum_uses` is set, while the process is run async (e.g. MetaSchedule RPCRunner), we might want some logic to make sure synchronization happens before recycling the child processes, which is beyond the scope of this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
AndrewZhaoLuo commented on code in PR #11094:
URL: https://github.com/apache/tvm/pull/11094#discussion_r855712750


##########
python/tvm/contrib/popen_pool.py:
##########
@@ -92,12 +92,18 @@ class PopenWorker:
 
     initargs: Tuple[object]
         A tuple of args for the initializer
+
+    maximum_uses: int or None
+        The maximum number of times a process can be used before being recycled

Review Comment:
   Hmm, should there be more warning about what "recycling" means? 
   
   Like what if someone has a very stateful function and moving things to init state is undesirable / unexpected



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] AndrewZhaoLuo commented on pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
AndrewZhaoLuo commented on PR #11094:
URL: https://github.com/apache/tvm/pull/11094#issuecomment-1105930437

   Wonder what others think though? @shingjan ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] junrushao1994 commented on pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on PR #11094:
URL: https://github.com/apache/tvm/pull/11094#issuecomment-1105992319

   That's exactly one issue that we recently identified. Thank you so much for this fix! Also CC @tqchen as the original author of PopenWorker


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
AndrewZhaoLuo commented on code in PR #11094:
URL: https://github.com/apache/tvm/pull/11094#discussion_r855708206


##########
python/tvm/contrib/popen_pool.py:
##########
@@ -133,7 +139,11 @@ def kill(self):
                 self._proc.kill()
             except OSError:
                 pass
+
+            # Join the child process to avoid zombie processes
+            self.join(timeout=1.0)

Review Comment:
   is this needed? since when we remove reference to _proc python will probably call wait()



##########
python/tvm/contrib/popen_pool.py:
##########
@@ -92,12 +92,18 @@ class PopenWorker:
 
     initargs: Tuple[object]
         A tuple of args for the initializer
+
+    maximum_uses: int or None
+        The maximum number of times a process can be used before being recycled

Review Comment:
   Hmm, should there be more warning about what "recycling" means? 
   
   Like what if someone has a very stateful function and moving things to init state is undesirable.



##########
python/tvm/contrib/popen_pool.py:
##########
@@ -92,12 +92,18 @@ class PopenWorker:
 
     initargs: Tuple[object]
         A tuple of args for the initializer
+
+    maximum_uses: int or None
+        The maximum number of times a process can be used before being recycled

Review Comment:
   nit: indicate what none means in representation throughout file



##########
python/tvm/contrib/popen_pool.py:
##########
@@ -213,12 +223,19 @@ def send(self, fn, args=(), kwargs=None, timeout=None):
         # pylint: disable=import-outside-toplevel
         import cloudpickle
 
+        if self._proc is not None and self._maximum_uses and self._remaining_uses == 0:

Review Comment:
   we can move this block below lines 248-250 and remove self._proc check. It will also immediately free resources after last remaining use instead of needing additional send call.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] junrushao1994 commented on pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on PR #11094:
URL: https://github.com/apache/tvm/pull/11094#issuecomment-1107073399

   Thanks @petersalas for bringing this workaround, and @AndrewZhaoLuo @shingjan @zxybazh for review and discussion :-))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
AndrewZhaoLuo commented on code in PR #11094:
URL: https://github.com/apache/tvm/pull/11094#discussion_r855732811


##########
python/tvm/contrib/popen_pool.py:
##########
@@ -133,7 +139,11 @@ def kill(self):
                 self._proc.kill()
             except OSError:
                 pass
+
+            # Join the child process to avoid zombie processes
+            self.join(timeout=1.0)

Review Comment:
   Hmm, so freeing the pid entry is non-deterministic in the other way. I see. I think being this explicit has definite advantages then.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] petersalas commented on a diff in pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
petersalas commented on code in PR #11094:
URL: https://github.com/apache/tvm/pull/11094#discussion_r855727390


##########
python/tvm/contrib/popen_pool.py:
##########
@@ -213,12 +223,19 @@ def send(self, fn, args=(), kwargs=None, timeout=None):
         # pylint: disable=import-outside-toplevel
         import cloudpickle
 
+        if self._proc is not None and self._maximum_uses and self._remaining_uses == 0:

Review Comment:
   We can't quite do that because we have to wait for the `recv` call (at the start of `send` is "ok" only by relying on the contract above):
   
   ```
           The caller must call recv before calling the next send in
           order to make sure the timeout and child process exit
           won't affect the later requests.
   ```
   
   But, I can move this to the end of `recv` if you'd like? I'm indifferent -- I put it in `send` to try consolidate (most) of the logic in one function but I certainly don't feel strongly about it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] petersalas commented on pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
petersalas commented on PR #11094:
URL: https://github.com/apache/tvm/pull/11094#issuecomment-1106652698

   @shingjan to be clear this change alone isn't enough to mitigate the leaks because the default is "never recycle" unless operations timeout. But specifying a value when initializing `PopenPoolExecutor` ensures that the background processes are killed periodically which prevents any per-operation leaks from accumulating too much. I used 32 in my test, which worked fine.
   
   Of course, we should _also_ try to fix the leaks :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] zxybazh commented on pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
zxybazh commented on PR #11094:
URL: https://github.com/apache/tvm/pull/11094#issuecomment-1105976806

   Very interesting, we observe the same behaviour with PopenWorker during tuning and our workaround is also very similar. Would be happy to learn about how you guys figured out the problem and do you have any trace on the root cause or potential fix. A minimum reproducible example would also be appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
AndrewZhaoLuo commented on code in PR #11094:
URL: https://github.com/apache/tvm/pull/11094#discussion_r855708206


##########
python/tvm/contrib/popen_pool.py:
##########
@@ -133,7 +139,11 @@ def kill(self):
                 self._proc.kill()
             except OSError:
                 pass
+
+            # Join the child process to avoid zombie processes
+            self.join(timeout=1.0)

Review Comment:
   is this needed? since when we remove reference to _proc python will probably call wait() in garbage collection



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] zxybazh commented on pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
zxybazh commented on PR #11094:
URL: https://github.com/apache/tvm/pull/11094#issuecomment-1106024382

   Just created one #11096.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] petersalas commented on a diff in pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
petersalas commented on code in PR #11094:
URL: https://github.com/apache/tvm/pull/11094#discussion_r855730149


##########
python/tvm/contrib/popen_pool.py:
##########
@@ -133,7 +139,11 @@ def kill(self):
                 self._proc.kill()
             except OSError:
                 pass
+
+            # Join the child process to avoid zombie processes
+            self.join(timeout=1.0)

Review Comment:
   Without it the behavior is (very) non-deterministic (TIL if the process hasn't exited when its `__del__` runs then it gets added to a list(!) that gets polled when future `Popen`s are created) -- but indeed it should eventually get reaped. (Granted there's no guarantee that it exits 1s after SIGKILL either but I was reluctant to block on it forever...)
   
   Having spent time debugging accumulating zombie processes in the past I'm generally loathe to leave them...but I'm good with removing it. LMK what you think



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
AndrewZhaoLuo commented on code in PR #11094:
URL: https://github.com/apache/tvm/pull/11094#discussion_r855728721


##########
python/tvm/contrib/popen_pool.py:
##########
@@ -213,12 +223,19 @@ def send(self, fn, args=(), kwargs=None, timeout=None):
         # pylint: disable=import-outside-toplevel
         import cloudpickle
 
+        if self._proc is not None and self._maximum_uses and self._remaining_uses == 0:

Review Comment:
   Ah I see, nah let's keep it consolidated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] AndrewZhaoLuo merged pull request #11094: [CONTRIB] Add PopenWorker process recycling

Posted by GitBox <gi...@apache.org>.
AndrewZhaoLuo merged PR #11094:
URL: https://github.com/apache/tvm/pull/11094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org